[tex-live] xindy vs utf8 latex

Zdenek Wagner zdenek.wagner at gmail.com
Tue May 6 12:14:47 CEST 2014


2014-05-06 11:57 GMT+02:00 David Carlisle <d.p.carlisle at gmail.com>:

>
>
>
> On 6 May 2014 10:05, Lars Madsen <daleif at imf.au.dk> wrote:
>
>> Hi
>>
>> I was wondering, is any work being done on making xindy/texindy work with
>> latex and utf8?
>>
>> This question highlights the problem and in an answer there is a good
>> workaround.
>>
>> http://tex.stackexchange.com/q/153858/3929
>>
>> If a xindy fix is too far in the future, would it make sense to provide
>> the ie2utf script from https://github.com/michal-h21/iec2utf as a part
>> of TL, and perhaps even make some perl wrapper for it, making it easier to
>> use for the casual user?
>>
>> (is piping available on all platforms?)
>>
>>
> Rather than make xindy understand LaTeX's somewhat idiosyncratic character
> representation it would probably be better to have an option in inputenc to
> write index files in utf8. As discussed recently on latex-l there  have
> been "modest" (hello Karl:-) changes in this area in the 2014/05/01 latex
> release and plans to better support inputenc on xetex/luatex in the near
> future.  A requirement for better support would be translation between the
> traditional LICR and utf8 characters, so that could probably be also used
> here with pdftex auxiliary files.
>

I am not sure whether all this can be achieved at the expand processor
level where inputenc works but I know that it is already implemented in
encTeX. I use it regularly for Czech specifying "-I omega" with texindy
(because the input markup is now the same as in omega). I am not sure
whether utf8-t1.tex contains characters needed for Swedish, Norvegian,
Danish (and other European languages) but it will be simple to do that. The
file was prepared by Petr Olšák for Czech and Slovak, so non-Latin
characters are not present but adding complete Unicode (or preparation of
other tables) should not be a problem.

I am aware only of two problems:
1. url.sty contains some definitions using the ^^ convention which then
looks as illegal UTF-8 character (encxvlna documentation shows how to solve
it)
2. conflict with microtype.sty if protrusion is used (no conflict with font
expansion), solution would require a hook in microtype.


> We have some internal tests although the lack of \Uchar in xetex and
> existing bugs in xetex ^^^^ parsing make supporting xetex tricky at the
> moment as you can generate the unicode number but you can not generate a
> character of that number in xetex (writing the utf8 from pdftex from
> latex's internal form wouldn't be hard)
>
> That's not to say the script shouldn't be added to TL (can't comment on
> that) more than one way to do something is always useful, but a bug/feature
> request in the latex-bug database to address that stackexchange question
> would be useful, so we don't forget:-)
>
> David
>
>


-- 
Zdeněk Wagner
http://hroch486.icpf.cas.cz/wagner/
http://icebearsoft.euweb.cz
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://tug.org/pipermail/tex-live/attachments/20140506/f7e8a4de/attachment.html>


More information about the tex-live mailing list