[XeTeX] XeTeX and diacritics
Peter S. Baker
psb6m at virginia.edu
Mon Mar 24 21:12:10 CET 2008
Ross Moore wrote:
>
> By the way, that junicode font looks nice.
> It's documentation has some advice that I don't know whether
> it is standard or not.
> e.g.
>
> Characters with diacritics.
> Both Unicode and MUFI contain large numbers
> of characters with diacritics. Make it a habit never to use these
> “precomposed”
> characters directly; rather use the “plain” character followed by
> a character from the Unicode “Combining Diacritics” range. (This
> works
> with Word for Windows when Uniscribe is enabled, and also with other
> OpenType-aware applications.) In almost all cases the application
> will either
> substitute the correct precomposed character or position the diacritic
> correctly.
>
>
> Presumably this is based upon an expectation that the combining
> characters are likely to work more often, rather than expecting
> a font to have all the precomposed ones available.
> Yet JK's remark concerning Gentium contradicts this.
>
> Furthermore, I setup xunicode.sty to use precomposed characters
> when there is a Unicode code-point allocated, with combining
> sequences as a fallback --- especially with the standard accents
> (i.e, those that occur in the older latin-based font encodings).
>
> So my question is:
> Is there really any advantage in following the above advice?
>
> Another question is:
> Does it matter what is done at the input or macro levels?
> Does XeTeX produce the same output in the PDF whether a precomposed
> character or combining sequence is used, when there is a choice?
>
>
>
The advice in the Junicode doc is specifically for Junicode users, and
especially for medievalists who need some of the hundreds of precomposed
characters in the Medieval Unicode Font Initiative
(http://gandalf.aksis.uib.no/mufi/) encoded in the PUA. I strongly
oppose including PUA characters directly in documents: to keep documents
portable I suggest instead using sequences of letter + combining
diacritic (+ combining diacritic . . .). Junicode uses ccmp to
substitute the precomposed character when possible, and uses anchors as
a fallback. Both of these methods, of course, work brilliantly with
XeTeX. One of Junicode's goals is to support all of MUFI while making it
unnecessary to actually insert any PUA characters into a document.
For the common combinations in the Unicode Latin ranges I expect that
most will just include the precomposed characters, and these will be
portable enough. But even there I can see some advantages to using the
combining diacritics.
None of this works in Gentium because it has little or no OpenType
support. How I wish it did!
Peter Baker
More information about the XeTeX
mailing list