[XeTeX] default char classes
Jonathan Kew
jonathan_kew at sil.org
Wed Mar 12 19:10:25 CET 2008
On 12 Mar 2008, at 1:31 pm, Barry MacKichan wrote:
> Jonathan, you have convinced me that language markup is needed.
:-)
There are, of course, simple cases where it's possible to get away
without it, and cases where "magic" font-switching would be handy for
specific purposes. But it's very hard to design a universal, robust
system.
> I am curious about Will's question. Are there efficiency concerns in
> defining lots of large token classes?
The main concern I'd have is that I suspect that in most cases, users
of character class and inter-char tokens will really only be
interested in a couple of scripts, and certain classes of characters
within those scripts (e.g., opening and closing punctuation). So it's
simplest for them if they define the specific classes that matter for
their application, and leave everything else in a default "other" class.
If we pre-assign all the Unicode characters to several dozen (at
least) classes, based on script and on other character categories --
in fact, we might easily hit 100 classes or more -- then packages
like zhspacing that care about a certain script, and consider
everything else "other", will have a lot of extra class-pairs to
consider, for no obvious benefit. That seems like an extra burden on
users/macro writers.
What we probably should do, as part of the xetex and xelatex formats,
is create a \newcharclass allocator (like plain TeX's \newcount,
etc), to help people manage class numbers without conflict.
If someone does want to try and implement comprehensive multi-script
automatic font switching (despite my reservations!), there's nothing
to stop them assigning all the Unicode chars to classes based on
script, and even precompiling this into a format file. (The unicode-
letters.tex file, and the Perl script that generates it -- found in
the xetex source tree -- could give some ideas how to go about this.)
JK
More information about the XeTeX
mailing list