[XeTeX] Hyphenation patterns and Unicode
Jonathan Kew
jonathan_kew at sil.org
Wed Oct 19 13:32:19 CEST 2005
On 19 Oct 2005, at 11:48 am, Nicola Vitacolonna wrote:
> Hi everybody,
> the XeTeX FAQ says that hyphenation patterns should be "true
> Unicode" files. It is not clear to me if the following (excerpt of
> a) file (for Lithuanian) is ok:
>
> \def\ltletters{
> \catcode"81=11\lccode"81="A1\uccode"81="81%A nosine
> \catcode"83=11\lccode"83="A3\uccode"83="83%C su pauksteliu
> \catcode"84=11\lccode"84="A4\uccode"84="84%E su tasku
> % etc...
> }
> \ltletters
> \patterns{
> .ap1
> .api1
> .a^^b23v
> %etc...
> }
>
This does not appear to be Unicode-compliant, as it is expecting
character codes such as (hex) 81, 83, and 84 to be accented letters.
(As it doesn't have these literal codes in the file, but uses ^^..
sequences, XeTeX will be able to read it; but the resulting patterns
won't be correct for Unicode text.)
I assume this file was created to work with one of the 8-bit
encodings used with TeX, such as T1, and this does not match Unicode
encoding for the accented letters.
> I would like to add this file to language.dat, rebuild all format
> files, and use LaTeX or XeLaTeX with babel. Is this expected to
> work? Or should I use the above file only for LaTeX with babel, and
> go for a different solution when I want to use XeLaTeX?
It would be possible to patch this file for XeTeX/Unicode in a
similar way to others that I've looked at: test if it is being loaded
by XeTeX, and if so, make the characters active and define them to
expand to their Unicode equivalents. That way, the actual pattern
lines can be left untouched, and the file still works as before when
used with a standard TeX.
I don't see this file among the standard collection, but if you need
assistance in adapting it for XeTeX, feel free to send me a copy.
JK
More information about the XeTeX
mailing list