[XeTeX] Table of contents

Ross Moore ross at ics.mq.edu.au
Sat May 1 00:51:46 CEST 2010


Hi Jonathan,

On 01/05/2010, at 4:06 AM, Jonathan Kew wrote:

> The problem is that at this point, the .aux file is read *with*  
> your \XeTeXdefaultencoding declaration in force, so the individual  
> utf-8 bytes that were written to it now get interpreted as cp1252  
> characters and mapped to their Unicode values, instead of the byte  
> sequences being interpreted as utf-8. That's the source of the  
> "junk" you're getting. Those utf-8-bytes-interpreted-as-cp1252 then  
> get re-encoded to utf-8 sequences as the .toc is written, so in  
> effect the original characters have been "doubly encoded".

This sounds like a pretty generic kind of problem, ...

>
> In this particular case, at least, you can work around the problem  
> by resetting the default encoding immediately before the end of the  
> document, so that when LaTeX reads in the .aux file at the end of  
> the run, it reads it correctly as utf-8. In other words, if you  
> modify this example to become:
>
>   \documentclass[10pt,a4paper]{book}
>   \usepackage[frenchb]{babel}
>   \usepackage{fontspec}
>   \usepackage{xunicode}
>   \usepackage{xltxtra}
>   \begin{document}
>   \frontmatter
>   \tableofcontents
>   \XeTeXinputencoding "cp1252"
>   \XeTeXdefaultencoding "cp1252"
>   \mainmatter\setcounter{secnumdepth}{2}
>   \chapter{Général de Gaulle}
>   Il était français.
>   \XeTeXdefaultencoding "utf-8"
>   \end{document}
>
> then your table of contents should correctly show "Général".

   ... so that the best solution might be to include
  a command such as:

    \AtEndDocument{\XeTeXdefaultencoding "utf-8"}

into the  xltxtra  package, so that it becomes something
that is always done, and authors do not need to worry about it.

Note that the  \@enddocumenthook  is expanded more or less
immediately after the  \end{document} has been encountered.
Certainly before the .aux  file is closed for writing,
and re-opened for reading.

viz.  (from  latex.ltx )

>>> \def\enddocument{%
>>>    \let\AtEndDocument\@firstofone
>>>    \@enddocumenthook
>>>    \@checkend{document}%
>>>    \clearpage
>>>    \begingroup
>>>      \if at filesw
>>>        \immediate\closeout\@mainaux
>>>        \let\@setckpt\@gobbletwo
>>>        \let\@newl at bel\@testdef
>>>        \@tempswafalse
>>>        \makeatletter \input\jobname.aux
>>>      \fi




> However, there may be other situations where auxiliary files are  
> written and read at unpredictable times during the processing of  
> the document, making it more difficult to control the encodings at  
> the right moments.

True. That gives another advantage in having the solution
recorded in a standard place such as  xltxtra.sty ,
preferably with some comments about why it is useful.
Then it can be found, and the solution patched-in to
the coding where other kinds of auxiliary files are
being written and read back in.

> In general, moving to an entirely utf-8 environment is a better and  
> more robust way forward.

True again, for new documents.
It is still desirable to provide solutions that cope
with technicalities that occur in other situations.


>
> HTH,
>
> Jonathan


All the best,

	Ross

------------------------------------------------------------------------
Ross Moore                                       ross at maths.mq.edu.au
Mathematics Department                           office: E7A-419
Macquarie University                             tel: +61 (0)2 9850 8955
Sydney, Australia  2109                          fax: +61 (0)2 9850 8114
------------------------------------------------------------------------






More information about the XeTeX mailing list