[tex-live] Problems with non-7bit characters in filename

Reinhard Kotucha reinhard.kotucha at web.de
Sat Jul 5 08:24:06 CEST 2014


On 2014-07-04 at 10:40:19 +0200, Ulrike Fischer wrote:

 > Am Fri, 4 Jul 2014 10:32:33 +0200 schrieb Zdenek Wagner:
 > 
 > >> But the main question is why lualatex and xelatex in TeXLive can't
 > >> handle (probably only non-utf8) file names with non-ascii chars *on
 > >> the terminal*. I can reproduce his problem on Win7:
 > >>
 > > The program has to use a system call to find the filesystem encoding
 > > and convert the filename from the filesystem encoding to the program's
 > > internal encoding or vice versa. I am not sure whether it can be done
 > > in lua but definitely not on macro level.
 > 
 > Sure. But we are not on the macro level here but on the "system
 > call" level. Why can't luatex in TeXlive not handle the system call
 > to file names correctly?

On Unix everything works as expected, as Markus already confirmed.

  $ luatex "Äöü-Русский язык-日本語.tex" \\bye
  This is LuaTeX, Version beta-0.79.1 (TeX Live 2014) (rev 4971) 
   restricted \write18 enabled.
  ("./Äöü-Русский язык-日本語.tex")
  No pages of output.
  Transcript written on "Äöü-Русский язык-日本語.log".

Unix filesystems store filenames in UTF-8, Windows filesystems use
UTF-16 (don't know about ancient FAT filesystems).  As far as the
filesystems are concerned, everything should work on Windows too.

The problem is the user interface.  A German Windows is using CP1252
and, even worse, CP850 on the command line.  Yes, Klaus, the latter
was used by DOS indeed.  And it's still used.

Zdeněk said that Samba works fine.  My experience is quite positive
too.  But there is no user interface with crippled national character
encodings involved.  If Samba works on filesystem level, all it has to
do is to convert UTF-8 (Unix) to UTF-16 (Windows) and vice versa.
This is a relatively simple and reliable task.

Problems occur when user interfaces are involved which are not aware
of Unicode.  Sure, if you know which encoding is used you can convert
any filenames to Unicode.  But what about \openout?  You can convert any
8-bit character encoding to Unicode but not vice versa.

What do you expect to happen if you create a file

  "Äöü-Русский язык-日本語"

with \openout and your terminal only supports crippled national
character encodings?  How can '汉语' be converted to something like
Latin1?  What do you expect to see on screen if LuaTeX sais

  Transcript written on "Äöü-Русский язык-日本語.log".

and your terminal supports only CP850? (the Exploder isn't much
better, it only supports CP1252).

On my Linux box everything works like a charm.  My locale setting is

  LANG=en_US.UTF-8

I'm using this setup for years and didn't encounter any problems.
I also don't hesitate to use non-ASCII characters in file names.
Some of my file/directory names contain Russian and Korean characters
and I didn't encounter any problems at all.

I'm sure that all this will be possible on Windows too in 20 or 30
years.  Be patient.  Stick to ASCII for the time being.  Or switch to
a reasonable operating system.  On Linux everything works like a charm
and I don't understand why people hesitate to create files with
non-ASCII characters in their names.  For sure, Windows is a pita.

Regards,
  Reinhard

-- 
------------------------------------------------------------------
Reinhard Kotucha                            Phone: +49-511-3373112
Marschnerstr. 25
D-30167 Hannover                    mailto:reinhard.kotucha at web.de
------------------------------------------------------------------




More information about the tex-live mailing list