[XeTeX] Converting legacy encodings to utf-8

Wed Jul 12 22:13:18 CEST 2006

On 12 Jul 2006, at 8:27 pm, Firmicus wrote:

> Will Robertson wrote:
>> But perhaps it's too weighed down with Aleph assumptions/dependence.
>> Note that I suspect a sort of equivalence between OCPs and TECkit
>> mappings...
> I'd be delighted if someone could confirm that! Up to now I only  
> wrote a
> few very simple TECkit mappings, and my initial impression was that
> TECkit's functionality is not as ambitious as that of Omega  
> translations
> processes (OTP). But perhaps I should just read the TECkit
> documentation... ;-)

I think it is at least slightly more adequate than the XeTeX  
documentation, at this point! :)

> Last year I wrote a set of OTPs to convert ArabTeX input to UTF-8,
> admittedly not a simple task. The results were yet not perfect, but
> pretty decent. Since then I more or less abandoned Aleph/Omega, at  
> least
> for my own practical purposes: too many bugs and headaches.
>
> Now if it indeed turns out that TECkit provides the equivalent
> functionality or OTPs, I would be willing to rewrite ArabTeX -> UTF-8
> TECkit mappings for the benefit of XeTeX's users.

It's a very long time since I looked at OTPs, so it's hard to  
comment. TECkit provides a limited regular-expression-like  
capability, so you can do some fairly complex things, but it was not  
designed as a completely general-purpose text processing system. But  
I suspect it would be perfectly adequate for an ASCII transcription  
to Unicode conversion, allowing ArabTeX input to be directly typeset  
with a Unicode Arabic font. Interesting idea.

One thing you *can't* do with TECkit "font mappings" in XeTeX,  
because of the level at which the mapping operation is applied, is  
introduce new control sequences, change fonts, etc. The mapping is  
applied to a sequence of characters that are to be typeset in a  
particular font, and results in a new sequence of characters that  
will be rendered with that font; they will not be re-scanned by TeX  
in any way.

> (Despite the
> availability of Unicode bidi editors nowadays, there are still
> compelling reasons why one -- in particular linguists,  
> orientalists, or
> historians of science like myself -- would prefer to input a language
> such as Arabic by means of an intelligent ASCII encoding  
> convention. But
> this is another story.)

Yes, I can understand this wish.

JK