[tex-live] Problems with non-7bit characters in filename

Reinhard Kotucha reinhard.kotucha at web.de
Sat Jul 5 03:24:53 CEST 2014


On 2014-07-04 at 09:42:45 +0100, Robin Fairbairns wrote:

 > Reinhard Kotucha <reinhard.kotucha at web.de> wrote:
 > 
 > >  > While latin1 can include every possible character, UTF-8 cannot.
 > > 
 > > This is definitely wrong.  The opposite is true.
 > 
 > no, it's correct: iso 8859-1 has no "forbidden" octets (it does, iirc,
 > have some unassigned ones)
 > 
 > whereas
 > 
 > utf-8 rejects some octets in some contexts, since it's generating a
 > 32-bit glyph from 8-bit input.  (it's complicated.  honest.)

True, but we were talking about characters and a character is not
necessarily an octet.  I suppose that the confusion arose bcause we
don't use the term 'character' in the same way.

We are talking about input encodings which map characters to numbers.
With the advent of Unicode, these numbers can be > 255, hence
some characters have to be represented by a sequence of octets.

There are definitely a lot of things which lead to the confusion.  In
C, for instance, there is a data type 'unsigned char'.  It's actually
an octet.  However, before other encodings than ASCII came up there
was no difference between characters and bytes.

BTW, UTF-8 doesn't generate glyphs.  A glyph is the graphical
representation of a character.  A font contains glyphs.  But since 
we are talking about low-level character encodings we don't have to
worry about glyphs.  Of course we need a font which provides all the
glyphs in order to get reasonable output on a terminal but it's less
relevant if we are talking about filename encodings.

Regards,
  Reinhard
  
-- 
------------------------------------------------------------------
Reinhard Kotucha                            Phone: +49-511-3373112
Marschnerstr. 25
D-30167 Hannover                    mailto:reinhard.kotucha at web.de
------------------------------------------------------------------



More information about the tex-live mailing list