[tex-live] Re: UTF-8 support

Vladimir Volovich vvv@vsu.ru
Wed, 22 Jan 2003 15:26:19 +0300


"PO" == Petr Olsak writes:

 PO> I am not talking about LaTeX. My encTeX is a solution for all
 PO> macros, not only LaTeX.
 >> LaTeX is a macro package (i.e. it works purely with standard TeX
 >> features), thus it shows that it is possible to use similar robust
 >> approach in other macro packages too.

 PO> The LaTeX approach is not robust. This is a reason why I
 PO> developed my encTeX.

 PO> I cite from ucs.sty documentation:

 PO>   UTF-8 characters are interpreted by TeX as a sequence of
 PO> commands, so don't use calls like \macro ä instead of \macro{ä}.

it is always a good style to delimit macro arguments with braces

 PO> It means that I don't completelly switch all my old documents to
 PO> UTF-8 because problems can occur! On the other hand, the encTeX
 PO> is really robust solution.

with encTeX, expansion of a multibyte UTF-8 character can also be not
a single letter, but a sequence of several tokens (e.g. a call to
macro), - so encTeX suffers from exactly the same "problem": you can't
be sure that one UTF-8 character in the input file will be one token,
so you cannot use \macro ä in encTeX too, unless you are sure that ä
will expand to some single character but not to, say, \"a.

 PO> The second example: You have written that \write files includes
 PO> only \'A notation of characters in LaTeX. Do you know a documents
 PO> where you have to re-read the \write files in verbatim mode? I
 PO> know these documents. What happens in LaTeX in such situation?

nothing bad - it is very well possible to write to files in LaTeX
using the ASCII LICR representation, and then read the files back:
you'll need to translate \ into, say, \textbackslash, and characters
like Á to \'A (which is a native representation in LaTeX); then, when
you read the file back, all will be correct:
* Á will be written as \'A, and read back as Á
* \'A will be written as \textbackslash 'A, and read back as \'A
so verbatim representation will be preserved.
(fancyvrb package contains a lot of such framework)

 PO> Please, don't disseminate that UTF-8 solution in LaTeX is
 PO> robust. This is not true.

LaTeX is robust, - you only need to use it consistently.

Best,
v.