Text Rendering Notes
A summary of observations about text rendering!
This page is the result of an on-going process of planning a refactor of Inkscape's text handling.
General Comments
Our software stack relies on the FreeType, HarfBuzz, Pango, and Cairo libraries for rendering text.
- FreeType: Access to font internals.
- HarfBuzz: Converts characters to glyphs (i.e. shaping). (Used indirectly via Pango.)
- Pango: Determines best fonts to render text for given style and characters.
- Cairo: Rendering of glyphs.
The boundaries between these libraries is somewhat murky and often one can do the same things using routines from different libraries.
HarfBuzz is under active development and is used by many major pieces of software. FreeType has some development. Pango and Cairo are poorly maintained at the moment. See https://blogs.gnome.org/mclasen/2019/05/25/pango-future-directions/.
Changes for Inkscape
A number of changes we should make to Inkscape's code even if it is not completely rewritten.
Font Hash
Currently Inkscape uses a custom hash of the Pango Font descriptor to index fonts in a font map. Pango has a built in hash which we should probably use instead. This font hash includes "gravity" (glyph orientation, important for vertical text where, for example, the letter 'A' can be rendered upright or sideways) which is missing in our custom hash.
Remove USE_PANGO_WIN32
The use of USE_PANGO_WIN32 (directly using Window fonts) was disabled in 2011. Some functionality (e.g. variable fonts) aren't implemented for PANGO_WIN32.
Remove Support for Bitmap Fonts
Pango has dropped support for bitmap fonts (and type 1 fonts) with Pango 1.44. (Bitmap glyphs are still supported inside OpenType fonts.) I'm guessing that bitmap fonts haven't worked for a long time.
General Text Layout Algorithm
Inputs
- A list of spans: Text with unique style and/or positioning attributes.
- A list of shapes: Possibly an infinitely long line.
Steps
- Use Pango to itemize spans: This breaks spans into smaller pieces based on font, script direction (right-to-left, left-to-right), etc.
- The designated font may be missing all the glyphs needed to render the text; Pango will find the best alternative font that has the missing glyphs and create an item for them.
- Shape the Pango items to determine which glyphs with positions should be used to render the text. Either Pango or HarfBuzz can be used (Pango uses HarfBuzz under the hood).
- Create Character, Glyph, Cluster mappings. These will be needed for text editing. Use a specialized iterator to walk through maps.
- Layout items in allocated space, determining glyph positions. Must handle SVG kerning, text-length attributes, white-space (spaces, tabs, line-returns), etc.
- Locate start text insertion point (baseline of line of text).
- Can be from 'x' and 'y' attributes or calculated from shape and ascender height.
- Calculate chunks for line.
- A chunk is an unbroken section of a line. Usually there is one chunk per line but a line may be divided into multiple chunks if a shape has holes.
- A chunk can be of infinite width (or height for vertical text) in the case of SVG 1.1 text.
- Fit items into chunks.
- If an item doesn't fit, attempt to break the item into two items, one that fits into the chunk and one that starts the next chunk.
- If an item that fits into a shape has an ascent greater than the initial ascent, move the line down and restart find and fill chunks.
- Move to next line position and repeat until all text is laid out or there is no more space in the shapes.
- Locate start text insertion point (baseline of line of text).
- Apply SVG positioning attributes.
- Render glyphs. We use FreeType to extract out glyph paths. We could use Cairo directly but as we need access to the paths when converting text to path, it doesn't reduce code.
- Determine cursor position (with link via iterator into Character, Glyph, and Cluster maps). Cursor should use cluster position with fractional adjustment based on character number within cluster.
Our current code gets the PangoFontDescription from PangoItem.PangoAnalysis->PangoFont and uses it as an argument for our FontFactory::Face(). We save a copy of the PangoItem in our own PangoItemInfo structure before the PangoItem glist is destroyed. It is not clear if the PangoFont exists afterwards (PangoItem contains a PangoAnalysis but PangoAnalyis contains a pointer to PangoFont). FontFactory::Face() takes the PangoFontDescription, sets a fixed size, and uses it to search for an already existing font_instance, if not found it calls pango_font_map_load_font() to get the PangoFont corresponding to the description and create a new font_instance for the face. The new instance calls InstallFace(PangoFont), if this fails it tries to load a generic sans-serif face. Once the font_instance is found or created, font_instance->InitTheFace() is called. InitTheFace() finds the underlying FreeType font and reads various OpenType tables. InitTheFace() is called from a variety of functions. It's functionality probably should be handled during constructing a font_instance.
Itemization
Taking a text and breaking it into blocks that have the same properties.
Breaking text into blocks is a two step process:
- Break text into blocks based on SVG requirements. For example, any new positioning attribute creates a new block that must be shaped independently of previous characters. We must do this ourselves.
- Break text into blocks based on language, direction, etc. requirements. This is handle by Pango.
Proper shaping often requires knowledge of the characters before and after the text being shaped. For example, a user might want to change the color of a character in the middle of an Arabic word. Arabic uses different glyphs depending on if the character is at the beginning, in the middle, or at the end of a word. If the character is shaped out of context, the start form of the glyph will always be used. By putting it in context (using pango_shape_full()
), the correct glyph will be chosen. (This was fixed for 1.1.)
Inkscape currently (1.1) shapes each SVG text node independently. (For example, in <text>AA<tspan>BB<tspan/>CC</text>, there are three text nodes, one for 'AA', one for 'BB', and one for 'CC'.) With pango_shape_full()
this seems to handle correctly glyph selection for Arabic text. But there are situations where this does not work. For example, if one wanted to change the color of a mark (accent), the mark would not be properly placed over the base character:
m̃ vs. m̃ (View in Firefox.)
When Pango itemizes text for Pango::Layout, it uses only attributes that should affect itemization:
- Language
- Font family, style, weight, variant, stretch, size, scale, gravity, gravity hint, fallback
- Letter spacing
- Baseline shift
Before shaping, it adds back:
- Allow breaks
- Insert hyphens
- OpenType font features
- Show (how to render invisible characters)
The text is finally rendered with all attributes which adds back:
- Foreground color, alpha
- Background color, alpha
- Underline, color
- Strike through, color
See pango_layout_check_lines() for how Pango::Layout does itemization (and line layout).
Shaping
Mapping characters into glyphs. This includes determining relative offsets between glyphs that belong to the same cluster.
HarfBuzz vs. Pango Coordinates
- HarfBuzz: y points upward.
- Pango: y points downward for horizontal text, to the right for vertical text. (Pango handles vertical text as if it was horizontal. The user must rotate the result.)
To translate between HarfBuzz and Pango shaping output:
Pre Pango 1.48.1:
- Horizontal text (and vertical sideways text): invert y directions.
- Vertical text: dy (HarfBuzz) = dx (Pango) - width (Pango); dx (HarfBuzz) = dy (Pango) Note: Requiring the '-width' is due to a Pango bug.
Pango 1.48.1 and after: (Diff)
- Horizontal text: unchanged.
- Vertical text: shifted by vertical x_origin, y_origin (See hb_ot_get_glyph_v_origin() in hb-ot-font.cc.)
origin_x = glyph_h_advance/2 origin_y = if VORG table exists and font is in CFF format: VORG.y_origin else if extents exist: y_bearing + top_side_bearing else font ascender (i.e. font wide ascent)
See this note for an explanation of using x_origin and y_origin to correct glyph placement (and why this causes problems).
(Note: in pango_renderer_draw_layout_line(), glyphs are shifted by logical_rect.y + logical_rect.height / 2 if PANGO_ANALYSIS_FLAG_CENTERED_BASELINE is set.)
Shaping to rendering
- East -> Vertical text with the glyphs oriented base down.
- South -> Vertical text with the glyphs oriented base to the left (sideways).
The following applies to pre-Pango 1.48.2!!!
Pango to Cairo glyph
Direction | offset_x | offset_y | advance |
---|---|---|---|
East | -dy | width - dx | y (width) |
South | -dy | dx | y (width) |
Horizontal | dx | dy | x (width) |
HarfBuzz to Cairo glyph
Direction | offset_x | offset_y | advance |
---|---|---|---|
East | dx | -dy | y (y_advance) |
South | dy | dx | y (x_advance) |
Horizontal | dx | -dy | x (x_advance) |
Pango to FreeType glyph
Direction | offset_x | offset_y | advance |
---|---|---|---|
East | width - dx | -dy | x (width) |
South | dx | -dy | x (width) |
Horizontal | dx | -dy | x (width) |
Shaping Notes
General
- HarfBuzz has a baseline API. Older versions of HarfBuzz incorrectly report success in reading non-existent baselines. This has been fixed.
- Shaping returns glyphs in visual order (left-to-right). For right-to-left text we must reverse them (SVG dictates glyphs are to be drawn in character order). HarfBuzz has some functions to reverse order that may be useful.
- In practice, this may not be important. Inkscape currently draws consecutive glyphs that have the same style/font together (try adding a stroke to characters that have a negative character spacing so they overlap). This violates the spec.
- Proper shaping requires knowledge of characters before and after those being shaped. Both Pango and HarfBuzz have mechanisms for this (e.g. pango_shape_full()).
- Can not directly use FreeType font metrics for advance as it fails for vertical text if font does not contain vertical metrics (e.g. for non-spacing marks).
Vertical Text
Vertical text is the Achilles' heel of text rendering! PangoLayout and pango-view show rendering errors for fonts without vertical metrics and for fonts with vertical metrics when used with accents (Pango 1.48.1 through 1.48.3).
- PangoLayout and pango-view create a Cairo font with the FC_VERTICAL_LAYOUT true (Pango 1.48.3 and earlier). This causes Cairo to shift glyphs down and to the left. Recent versions of Pango use HarfBuzz for shaping which uses horizontal metrics and thus should not set FC_VERTICAL_LAYOUT to true. A "fix" merged for 1.48.1 attempts to undo the shift but this fails for fonts without vertical metrics. A better fix (by moi) was merged in time for 1.48.4.
- FreeType has a flag FT_LOAD_VERTICAL_LAYOUT for FT_Load_Glyph(). This should not be used if FT_HAS_VERTICAL is not true according to the documentation. (And shouldn't be used with HarfBuzz shaping.)
- Pango logical_rect and hb_font_extents use "Win Ascent" and "Win Descent" values (FontForge OS/2 Metrics Tab), which can be greater than em-size height. OS/2 will clip rendering to this region. Super-confusing! Don't use Pango logical_rect!
- For vertical text:
ink_rect.width | = | -hb_glyph_extents.height |
ink_rect.height | = | hb_glyph_extents.width |
- HarfBuzz positions glyphs differently depending on if a font has vertical metrics or not, and in the latter case, depending on if "FreeType" or "OpenType" functions are enabled.
- It is desirable that Inkscape position glyphs in the same way regardless of if a font has vertical metrics or not. The preffered placement is that for fonts with vertical metrics. Glyphs from fonts without vertical metrics can be shifted using the y-bearing information from HarfBuzz (distance from glyph ink top to alphabetic baseline). Pango lacks direct access to this value as the function PangoFont::get_glyph_ink_extents are for the rotated glyph and are relative to the vertical central baseline.
Right to Left Script
Pango and HarfBuzz handle right-to-left scripts (Arabic, Hebrew, etc.) fairly well but there are a number of complications. There are three major complications:
- Embedded left-to-right scripts in right-to-left text and visa-versa.
- Overall paragraph direction effects how neutral characters are placed. For example '!' will switch sides: (right-to-left top, left-to-right bottom): A!vs.A!
- SVG dictates that glyphs are draw in logical order, that is in the order their corresponding characters are embedded in the SVG file, while Pango and HarfBuzz return glyphs in visual order, that is how they are physically positioned from left-to-right. Swapping glyphs to be in the proper order is non-trivial.
An input string can contain a mixture of characters from scripts with different directions (i.e. "اَلْعَرَبِيَّةُ Français עִבְרִית English!"). Pango will split the different scripts into different Pango::Items. When rendering an item that has a direction opposite of the base (paragraph) direction, one must move the glyph reference point the width of the item. Drawing the glyphs will move the glyph reference point back to the start. Then one must again move the glyph reference point to the end of the item to prepare to draw the next item.
Note: We do not support the unicode-bidi CSS property and thus effectively support only one level of direction change. Use of unicode-bidi by normal users is strongly discouraged by the CSS spec.
Clusters
When adjusting glyphs, especially in vertical-upright text, we should move glyphs in the same cluster by the same amount.
A cluster is a sequence of characters that need to be treated as a single, indivisible unit for text layout (see explanation in HarfBuzz documentation). HarfBuzz has three 'levels' for clustering:
- Level 0: Clusters include base character plus all marks and modifier symbols. This makes it easy to shift glyphs such as moving glyphs in vertical upright text down or to insert letter spacing. This is the default in HarfBuzz (to match Window's shaping engine).
- Level 1: Base characters and marks are in separate clusters. This allows marks to be colored different from base characters. This is what is now used in Pango. (It appears that pre-using HarfBuzz, Pango's behavior matched level 0.)
- Level 2: Ligatures formation and glyph decomposition does not merge characters/glyphs into a cluster.
For example, an 'a' with a 'grave' accent can be (but usually is not) composed of two separate glyphs, which would then belong to the same cluster in level 0 but in separate clusters in level 1. With both level 0 and level 1, the two characters in 'fi' are in the same cluster if a ligature is used in rendering. Complex scripts (such as south-Asian) can have a many-character to many-glyph mapping within one cluster.
The motivation for switching to level 1 by Pango is to allow the user more control over the rendering of glyphs. For example, marks can be given a different color than the base characters:
اَلْعَرَبِيَّةُ
In Firefox, the marks in the word "Arabic" (in Arabic) are shown in color. (Chrome only supports changing the base character color [example, the purple character] with marks colored the same as the base.) This recoloring is highly font and language dependent. The markup tries to color the two stacked marks above differently but this fails in Firefox with the font Estedad-VF as they are rendered by one glyph.
In the following, the circle in the Japanese word "パン" should be red but as fonts include the pre-composed glyph 'パ' it is rendered as one glyph. The same thing happens with 'a' and 'tilde'; it does not happen with an 'x' and 'tilde' as that combination is not a Unicode character (there is unlikely to be a pre-composed glyph for it). One can sometimes block the use of a pre-composed glyph by inserting a "zero-width non-joiner" Unicode character between the a base character and a mark as done with the 'e' and 'tilde' (this has undesirable side affects when tried with the Arabic text above as it also breaks the proper shaping of the base characters on either side).
パン ãx̃ẽ
But using level 1 has drawbacks. One loses valuable information about which glyphs go together to form groups that should be moved together. The use by Pango of level 1 was responsible for non-spacing marks being misplaced by Pango::Layout if letter-spacing is not zero. See bug report. This has now been fixed for non-spacing marks. It might still be a problem for spacing marks. If we use level 1 we'll need to do extra work during the layout stage.
Currently Inkscape does some limited guessing as to which glyphs belong together (basically just that non-spacing marks are considered to belong to the preceding glyphs). We actually have quite a bit of code to handle clusters. Note that as Pango uses level 1, the 'is_cluster_start' variable (Pango::GlyphInfo::get_attr()) is true for marks.
Clusters can also help with positioning the cursor so that one can indicate which character within a cluster is subject to editing. For example, if 'ffi' is represented by one glyph and thus one cluster, one can position the cursor before the cluster (before the first 'f'), one-third of the way into the cluster (between the two 'f's), two-thirds of the way into the cluster (between the second 'f' and the 'i') or at the end of the cluster. In this case, the cursor position closely indicates which character is being edited, with complex scripts, this might not be the case but still will give a visual indication that one is "moving" through the cluster. (It seems that some effort has been made to actually making it difficult to edit characters that make up a grapheme. It was actually quite difficult to construct the above example as once you've added a mark to a base character, most programs treat them as one unit.)
HarfBuzz includes some functions to reverse the order of glyphs/clusters. These might be useful as SVG dictates that glyphs should be drawn in character order while the shaper return glyphs in visual order (which is opposite for right-to-left scripts).
Glyph Selection
To do.
Font Features
To do.
Font Metrics
Ascent and Descent
Locating the ascent and descent of a font is important for laying out text. (See http://tavmjong.free.fr/blog/?p=1632.) Pango, HarfBuzz, and FreeType all have methods to extract out this data from a font.
OpenType has two tables that contain these values:
- OS/2 (Windows)
- sTypeAscender: Distance from baseline to top of em box (no longer required by OpenType spec).
- sTypoDescender: Distance from baseline to bottom of em box, normally negative.
- sTypeLineGap: Default spacing between lines (also called leading). Ignored by DTP software including Inkscape.
- usWinAscent: Used for clipping in Windows.
- usWinDescent: Used for clipping in Windows.
- sxHeight: Nominally the distance between the baseline and the top of the letter 'x'.
- sCapHeight: Nominally the distance between the baseline and the top of capitol letters like 'X'.
- HHEA (Mac)
- ascent: Nominally, the same as usWinAscent.
- descent: Nominally, the same as usWinDescent.
- lineGap: Nominally, the default spacing between lines.
CSS dictates that one should use the OS/2 sTypoAscender and sTypeDescender values if available, falling back to the HHEA Ascent and Descent values if missing. See https://www.w3.org/TR/CSS2/visudet.html#sTypoAscender. See also the The Webfont Strategy (2019) section of https://glyphsapp.com/learn/vertical-metrics.
Fonts can request that the Typo values be given preference, but popular fonts like DejaVu Sans do not set the request flag. Pango::FontMetrics::get_ascent() and get_descent(), and hb_font_get_extents_for_direction() return the HHEA numbers if the flag is not set (see hb-ot-metrics.cc, _hb_ot_metrics_get_position_common()
). To ensure that we use the Typo values we must resort to using FreeType to read the 'OS/2' table (with fallback to the HHEA table). If vertical metrics are requested, they are read from the VHEA table (which is unlikely to exist for most fonts).
Note that in Pango, the logical_rect 'y' is the negative of the ascender value and the 'height' is the ascender - the descender (where the descender is negative). The width is the glyph advance. (See pangofc-font.c
).
x-height
The distance between the alphabetic baseline and the top of the 'x' glyph is known as the x-height and is one of the fundamental units of SVG. Fonts can set the sxHeight value in the OS/2 table (since version 2). If a font (e.g. DejaVu Sans) does not set the value, one can attempt to measure the value using glyph data. HarfBuzz and FreeType have methods to easily get a glyph's bounding box. Pango could do it but in a convoluted way (requiring itemizing).
Baselines
With Latin scripts, glyphs are normally aligned to the alphabetic baseline. South Asian scripts usually use a hanging baseline. Japanese and Chinese use a center baseline in vertical layouts. SVG allows one to specify which baseline should be used. Mathematical equations can use a mathematical baseline. OpenType has a table BASE for baseline values which can be read by HarfBuzz; neither FreeType or Pango can. HarfBuzz has a bug, fixed in 2.7.3, where true is returned by hb_ot_layout_get_baseline() even if an entry is missing in the baseline OpentType table.
If the values are missing in a font, one can try to measure the values by looking at the ink rectangle (bounding box) of specific glyphs (for example the minus sign can be used to define the mathematical baseline). Both FreeType and HarfBuzz can relatively easily be used to find the ink rectangle of a specific glyph. Pango could but it's complicated (one would need to itemize some text first).
Underline and Strike-Through
Fonts can define the position and width of underlines and strike-throughs. Underline is defined in the post (PostScript) table while strike-through is defined in the OS/2 table. (FontForge handles Underline on the General page while Strikeout is under the OS/2 page's Sub/Super tab.)
FreeType has access to these values in the appropriate tables. Pango also gives access to them (Pango::FontMetrics). HarfBuzz does not give access to these values.
Slant
Inkscape uses the slant for Italic fonts to match the text cursor to the text slope. The slant is stored the hhea and vhea OpenType tables. FreeType can access these tables; Pango and HarfBuzz cannot. The post table also has an italicAngle value, accessible by FreeType.
Vertical Text, Down and Dirty
Harbuzz 4.1 introduces some changes to vertical text which need to be accounted for by Inkscape. This prompted a closer look at handling vertical text. This is quite complicated as it involves the interactions of four different libraries (HarfBuzz, Pango, Cairo, and FreeType) and two different programs (Inkscape and FontForge or whatever program is used to generate the font files).
Fonts
The font file can be a TrueType or an OpenType font. The actual difference between the two is very murky and most software no longer tries to distinguish between the two. Both font formats contain a set of tables with the font data (glyph outlines, positioning information, etc.). These table definitions are shared between the formats; the difference being mostly which tables are typically found in which format. The most important tables for vertical text are:
- hhea and hmtx: Contain horizontal font metrics. Almost always present.
- vhea and vmtx: Contain vertical font metrics. Usually missing unless a font is designed with vertical text in mind. If present, vertical text will be laid out correctly.
- GDEF, GPOS, GSUB: These tables contain information on which glyphs should be used and how they should be positioned.
- How marks should be placed relative to a base glyph.
- Substitutions of ligatures.
- Substitution of special vertical text forms.
- etc.
- BASE: Alphabetic, mathematical, CJK, etc. baseline data.
FreeType
We use FreeType to extract out font data that we need inside Inkscape including the glyph outlines (needed, for example, when stroking a glyph). HarfBuzz and Pango also use FreeType, although their reliance on FreeType is slowly being removed.
HarfBuzz
HarfBuzz is responsible for "shaping" a string of characters, that is converting a string of characters into an array of glyphs with positions that can be used to render a section of text. It takes a string of Unicode characters and a font file as input. Until very recently, it had no rendering support.
Pango
Pango is a higher level text layout program. Recent versions rely on HarfBuzz for shaping. Pango handles things like font substitution (finding the best matching font when the original font is not available) and embedding left-to-right text inside right-to-left text.
Cairo
We use Cairo to actually render the glyphs using information from Pango for where the glyphs are to be placed.
Inkscape
We apply some corrections to the output of Pango for vertical text, things like how the text is aligned to a reference point, fixing the advance to one em-box when vertical metrics are not available, and shifting the glyphs to ensure even spacing between glyph baselines.
Terminology
- Face or Font Face
- A collection of glyphs with the same style (family, slant, weigth, etc.) used to render text.
- Font
- A font face with a specified size.
- Em Box
- Nominally a box with the width and height equivalent to the width of the 'M' glyph. The 'em' box is scaled to the CSS 'font-size'. It is the basic unit in font design, usually 1000 (OpenType or 1024/2048 (TrueType) designer units in width/height.
- Baseline
- The point at which glyphs are aligned along a line of text. For horizontal text, this is usually the Alphabetic baseline which is at the bottom of most capital letters. Vertical text usually uses a Center baseline. Alternative baselines include the "Hanging" baseline, used by South-Asian scripts and the "Math" baseline. Baselines are important when mixing different sized fonts.
- Ascent/Ascender
- The distance from the alphabetic baseline to the top of the font. What is meant by the top of the font depends on context. In CSS, it is the top of the 'em' box. Fonts may contain ascent values that differ in meaning. In particular, the OS/2 typo table value winAscent is used on Windows to clip rendering. The OpenType specification use to require the ascent (typoAscender) + descent (typoDescender) equal the em box height. It no longer does so. See: https://glyphsapp.com/learn/vertical-metrics. CSS recommends using the OpenType sTypoAscender value. See https://www.w3.org/TR/CSS2/visudet.html#sTypoAscender.
- Descent/Descender
- The distance from the alphabetic baseline to the bottom of the font. See Ascent.
- Gap/Line spacing
- The extra space between lines of text. Fonts make contain information on the recommended spacing (OS/2 typeLineGap). This is not used by CSS.
- x-height
- The distance from the baseline to the top of the lower-case 'x' glyph (or equivalent).