MS RFC 98: Label/Text Rendering Overhaul

Date:2013/06
Author:Thomas Bonfort
Contact:thomas.bonfort@gmail.com
Status:Adopted
Version:MapServer 7.0

1. The current situation

1.1 Text rendering pipeline

When a feature needs to be labelled, the following (simplified) steps are undertaken:

  • the string eventually goes through iconv to be converted to utf8

  • the string goes through fribidi to reorder the glyphs from “logical order” to “visual order”, to support languages that are written from right to left. e.g. supposing capital letters are arabic letters, the text which is stored in the feature as this is some ARABIC text is transformed into this is some CIBARA text in order to be fed transparently to the renderers that currently layout glyphs from left to right

  • the string goes through line breaking, which is currently broken for RTL scripts for line breaking as we render (for arabic \n text)

    TXET
    CIBARA
    

    instead of the required

    CIBARA
    TXET
    
  • the string goes through alignment, which is currently hacky and unprecise (thanks to yours faithfully) as we (try to) achieve alignment by padding with spaces instead of using precise offsets per line

  • the string is then passed to the renderers as-is, who are responsible for the whole laying out of each glyph

1.2 Limitations

While the current situation is simple to understand, and works reasonable well for latin languages, it has a number of shortcomings that this RFC aims to resolve:

  • All the renderers are duplicating layout logic:
    • splitting the utf8 string into individual glyphs
    • looking up a given glyph inside a font file
    • laying out a glyph with respect to the previous one (this includes spacing between subsequent characters, or determining how many pixels should be vertically added when a linebreak is encountered)
  • While latin languages have a 1-to-1 mapping between string characters and font glyphs, this is not the case for complex languages, where ligature glyphs need to be inserted depending on context (see the wikipedia complex text layout page for a more complete explanation). While this is partly taken care of for us by fribidi for arabic (which has a shaping algorithm that inserts ligatures for us), we currently fail hard for other complex text languages.
  • We don’t support double spacing or line spacing at all, and our text alignment (center, right) is imprecise, as implementation for these would need to be added to each individual renderer.

2. Proposed Overhaul

For the big picture, see the state of text rendering.

In short, instead of passing a string of text and a starting position to the renderers, we pass a list of glyphs (i.e. a glyph is a specific entry inside a specific font file) along with their precise positioning, very similarly to what we do currently when rendering FOLLOW labels, as is already outlined in bug 3611 . The actual shaping and layout happens in a single mapserver function, and the renderers just dumbly place glyphs where they are told to.

2.1 Architecture Implications

Architecturaly, this modification is significant, as the whole label rendering chain needs to be refactored in order to take a list of positioned glyphs into account instead of just plain text strings.

2.1.1 Dependencies

  • Freetype becomes a central mapserver dependency, as we need the information about glyph sizes at a higher level than currently (where freetype is a dependency of the individual renderers). We also need a single central glyph cache to be used across harfbuzz and the renderers. fontsetObj will likely need to be extended to be more than a simple filename hashtable.
  • Fribidi (which has some thread safety issues) is kept as the library that handles the bidi algorithm for us. We do not use it for shaping anymore.
  • Bring in Harfbuzz to support complex script shaping (to simplify: the tool that will insert ligatures between characters). Harfbuzz algorithms will be applied on text strings that have been determined to not be latin (i.e. the slowdown induced by harfbuzz is limited to those languages that actually require shaping).
  • UTHash (http://troydhanson.github.io/uthash/) is a header only hashtable implementation that works with arbitrary keys and values, and is used for accessing cached fonts and glyphs. Given some performance testing, we might want to replace our own hashtable implementation with this one some day.

Note that the Fribidi and Harfbuzz dependencies will remain optional and can be disablable at compile time for those only treating latin scripts. Harfbuzz and Fribidi will need to be enabled or disabled together.

2.1.2 Font and Glyph cache

(fontcache.c) A global (thread protected if needed) glyph and font cache will ensure that cached glyphs are reusable across multiple requests for the fastcgi case, but in turn requires some thread-level protection and probably some pruning in order for it to remain of reasonable size. Some APIs have changed in order to have the fontcache accessible. The fontcache contains caches for:

  • font faces (i.e. the representation of of a truetype file)
  • glyphs (i.e. the metrics of an individual glyph at a given size)

2.1.2 Transforming a string of text into a list of positioned glyphs

This step will be added to mapserver, and consists in coordinating the outputs from freetype, the bidi algorithm, harfbuzz, line-breaking, text alignment, and in the future text and line spacing. This step could be in most part implemented transparently through pango, however pango is hard to support across platforms, and, to my knowledge, has a hard dependency on fontconfig which is incompatible with our fontset approach.

Text is represented by a “textPathObj” which is basically a list of positioned glyphs. (e.g the word “Label” at size 10 for an arial font is represented by arial font’s glyph “L” at position (x,y)= (0,0), glyph “a” at position (10,0) , “b” at (18,0), etc...). Multiline text is handled transparently by having glyphs positioned at different y values. A textPath can be either “absolute” (i.e. the glyph positions are in absolute image coordinates, used to position glyphs for angle follow labels), or “relative”, in which case they must be offset by their labeling point.

All the shaping happens in textlayout.c, who’s principal role is to take a string of text as input, and return a list of positioned glyphs as output. The input string goes through multiple steps, and is plit into multiple run. Each run will have a distinct line number, bidi direction, and script “language”.

As an example, we’ll be working with the input unicode string “this is some text in english, ARABIC and JAPANESE”. Capital letters are used to denote non latin glyphs, also note that ARABIC is stored in logical (=reading) order, whereas it would be rendered as CIBARA.

  • iconv encoding conversion to convert the string to unicode
run1 = "this is some text in english, ARABIC and JAPANESE", line=0
  • line wrapping: break on wrap character, break long lines on spaces
run1 = "this is some text in english,", line=0
run2 = "ARABIC and JAPANESE", line=1
  • bidi levels, each run has a single bidi direction (i.e. left-to-right or right-to-left)
run1 = "this is some text in english,", line=0, direction=LTR
run2 = "ARABIC" line=1, direction=RTL
run3 = " and JAPANESE", line=1, direction = LTR
  • script detection is applied to enable language dependent shaping, and also to refine which fonts will be used (more on that later)
run1 = "this is some text in english,", line=0, direction=LTR, script=latin
run2 = "ARABIC" line=1, direction=RTL, script=arabic
run3 = " and " line=1, direction=LTR, script=latin
run4 = "JAPANESE" line=1, direction=LTR, script=hiragana
  • for each run, we select which font should be used, in order to use the same font inside a given run. MS RFC 80: Font Fallback Support allowed to specify multiple fonts for a LABEL, this has been extended to be able to fine tune which fonts are to be preferably used for a given script:
LABEL "arialuni,arial,cjk,arabic"

can now be written prefixed by a script identifier, i.e.

  LABEL "arialuni,en:arial,ja:cjk,ar:arabic"

This is needed as there is and will be overlap between font glyph
coverages, and it should be possible to prioritize which font is used
for which language.
  • Each run is then fed into harfbuzz, which returns a list of positioned glyphs. The number of returned glyphs is not meant to be identical to the number of glyphs we had in the unicode string, and are ordered from left to right. As a speedup, a run who’s script is “latin” is not fed through harfbuzz, but instead uses cached glyph advances.
  • The glyphs of each run are reassembled to account for line numbers and run positions (e.g. run 3 is offset down by one line, and placed to the right of run 2)
  • Each line is horizontally offseted to account for ALIGN. LABEL ALIGN now stops defaulting to LEFT, so right-to-left runs will be right aligned instead of left as is now.

2.1.3 Handling glyphs at rendering phase

The labelcache and the renderers will need to be updated to work with a list of glyphs. Changes here are extensive but should remain conceptually simple. Individual renderers are substantially simplified.

Work has been done to trim down the labelcache computations as much as possible:

When inserting features into the labelcache:

  • We’ll insert a reference to the original labelObj instead of a copy if the labelObj and it’s child styleObjs don’t contain any bindings. This cuts dow on memory usage when attribute bindings aren’t in use.
  • We don’t insert features that will never get rendered (e.g. out of scale, too large for their feature (minfeaturesize keyword) )

At the msDrawLabelCache phase:

  • We delay computation of the label text bounding box to after we have checked conditions that would cause it not be renderered, i.e.

    • if they have a MINDISTANCE set and a neighbouring label with identical text has already been rendered
    • for labels without markers, we first check that the labelpoint doesn’t collide with an existing label.
  • The Collision detection has been optimized:

    • We keep a list of rendered labels and loop through those instead of checking the status of all the labels in the labelcache for each member
    • The bounding metrics for a label has been cut down from a full shapeObj to a struct containing a bounding rect and an optional lineObj. For non rotated labels, there’s no information needed more than the bounding box, which makes intersection detection much easier for two labels like this (i.e. the overlapping of these two labels is the same as the overlapping of their bounding boxes, no need to go into further geometric intersection primitives).

    The speedups for these changes are extremely important for cluttered maps, c.f. https://plus.google.com/u/0/118271009221580171800/posts/PrwhFYSkhea (e.g. rendering time goes from 800 to 1 second for 500.000 labels)

2.2 Backwards Compatibility

  • At a high (mapfile) level, none expected.
  • Some mapscript APIs may change or may need to be wrapped. (TBD)
  • Some subtle but widespread changes in character placement.
  • MS RFC 99: Remove support for GD renderer is a collateral of most of these changes

2.3 Performance Implications

Potentially numerous:

  • Existing performance on latin only text will need to be kept to the current level, by detecting and shortcutting these cases. If done correctly we may actually be able to speed up latin text rendering marginally.
  • For other scripts, the cost of going through harfbuzz will come with a performance penalty.

3. Miscellaneous

3.2 Voting History

+1 from MikeS, StephanM, TomK, JeffM, SteveW, SteveL and ThomasB