[XeTeX] anti-xunicode ;-)

Sun Jul 23 23:46:46 CEST 2006

On 23 Jul 2006, at 6:07 pm, Ralf Stubner wrote:
>
> It looks as if XeTeX uses the characters as they are inputed. In a  
> font
> with E, Edotbellow, acutecomb and dotbellowcomb, the Edotbellow  
> glyph is
> used when I input <E><dotbellowcomb><acutecomb>, but not for
> <E><acutecomb><dotbellowcomb>. Even though the first form would be the
> canonically reordered form of the second.

Right; xetex will simply render the given character sequence using  
the glyphs and rules in the font. On Mac OS X, many AAT fonts are set  
up so that the precomposed glyphs will be used for various  
canonically-equivalent sequences, but this is a font feature rather  
than something hard-wired into the rendering system. With OpenType  
fonts, I think it may be less common for the font to explicitly  
support all equivalent sequences.

You could use a font-mapping to apply NFC normalization to the text,  
so that the precomposed Unicode characters will always be used,  
regardless of the actual code sequences in the input; but of course  
if you happen to have a font that has combining marks but *lacks* all  
the precomposed Unicode characters, this would be a bad thing.

(If you want to try this, the TECkit mapping engine has built-in  
support for normalization; don't try to write out all the rules  
explicitly!)

> Of course, proper support for
> characters like this via 'mark' or 'ccmp' is the right way to go.

Indeed.

JK