[XeTeX] On cross-language font selection
Jonathan Kew
jonathan_kew at sil.org
Fri Feb 23 17:54:13 CET 2007
This is a topic that has come up several times in the past few years.
My view has been (as Will suggested) that it is not possible to come
up with a comprehensive and general scheme, independent of some kind
of language markup in the source. You cannot tell purely from the
Unicode values of the characters in the text what language they
represent, or what font or other typographic features should be used.
For example, in a document that combines both Chinese and Japanese,
the same Han character could be used in both languages, but different
fonts would probably be wanted.
At another level, there is the problem of punctuation characters that
are common to many scripts and languages. For example, suppose I have
a document that mixes Hindi and English. It's easy to say "use a
Latin font for the English letters, and a Devanagari one for the
Hindi letters". But what about punctuation such as parentheses, quote
marks, question marks, etc? The design of these will differ between a
typical Latin and Devanagari font, being harmonized with the style of
the letters. But it may not always be possible to reliably guess
which script a given character should be associated with. In many
cases, "the script of the preceding letter" would be a reasonable
guide, but it may not always be correct -- and there may not always
be any preceding letter at all!
For a web browser displaying arbitrary pages, font fallbacks are a
good thing; it's better to find a font that makes the text legible,
even if it sometimes makes choices that are typographically less than
ideal. But in the context of a professional typesetting system, I
don't want the computer guessing which font to pick for certain
ambiguous characters in my document; I want to be sure that I will
get exactly the fonts I have asked for. With this comes the
requirement that my markup must always, in some way, provide
sufficient information to unambiguously specify which font to use,
not "pick one of this collection, based on some complex heuristic for
guessing the current script".
However, I also have some good news! :) A new feature planned for
XeTeX 0.997 will make it easy to implement automatic font switching
for many simple situations, such as a mixture of Chinese and English,
with no need for embedded markup. This is *not* a general-purpose
"font collections" model, or a universal solution to multi-script
text, but should be a big help for many of the common cases people
try to implement with active characters, etc. Basically, it allows
you to "hook in" extra code (such as glue, penalties, font changes,
etc) between characters of the text, based on character classes
assigned to each Unicode value.
In view of this, I would suggest that people not spend a lot of time
and effort on perfecting macro-level solutions just now, but wait and
see what can be done with the new facilities that 0.997 will provide.
JK
More information about the XeTeX
mailing list