[XeTeX] Converting legacy encodings to utf-8
Jonathan Kew
jonathan_kew at sil.org
Wed Jul 12 22:13:18 CEST 2006
On 12 Jul 2006, at 8:27 pm, Firmicus wrote:
> Will Robertson wrote:
>> But perhaps it's too weighed down with Aleph assumptions/dependence.
>> Note that I suspect a sort of equivalence between OCPs and TECkit
>> mappings...
> I'd be delighted if someone could confirm that! Up to now I only
> wrote a
> few very simple TECkit mappings, and my initial impression was that
> TECkit's functionality is not as ambitious as that of Omega
> translations
> processes (OTP). But perhaps I should just read the TECkit
> documentation... ;-)
I think it is at least slightly more adequate than the XeTeX
documentation, at this point! :)
> Last year I wrote a set of OTPs to convert ArabTeX input to UTF-8,
> admittedly not a simple task. The results were yet not perfect, but
> pretty decent. Since then I more or less abandoned Aleph/Omega, at
> least
> for my own practical purposes: too many bugs and headaches.
>
> Now if it indeed turns out that TECkit provides the equivalent
> functionality or OTPs, I would be willing to rewrite ArabTeX -> UTF-8
> TECkit mappings for the benefit of XeTeX's users.
It's a very long time since I looked at OTPs, so it's hard to
comment. TECkit provides a limited regular-expression-like
capability, so you can do some fairly complex things, but it was not
designed as a completely general-purpose text processing system. But
I suspect it would be perfectly adequate for an ASCII transcription
to Unicode conversion, allowing ArabTeX input to be directly typeset
with a Unicode Arabic font. Interesting idea.
One thing you *can't* do with TECkit "font mappings" in XeTeX,
because of the level at which the mapping operation is applied, is
introduce new control sequences, change fonts, etc. The mapping is
applied to a sequence of characters that are to be typeset in a
particular font, and results in a new sequence of characters that
will be rendered with that font; they will not be re-scanned by TeX
in any way.
> (Despite the
> availability of Unicode bidi editors nowadays, there are still
> compelling reasons why one -- in particular linguists,
> orientalists, or
> historians of science like myself -- would prefer to input a language
> such as Arabic by means of an intelligent ASCII encoding
> convention. But
> this is another story.)
Yes, I can understand this wish.
JK
More information about the XeTeX
mailing list