[XeTeX] Re: XeTeX & Unicode vs. standard LaTeX

Mon Oct 11 15:53:52 CEST 2004

On Oct 10, 2004, at 6:10 PM, Ross Moore wrote:

Hi -

Thanks for the info.  My approach was more to get me through a long  
document than every document :-)
I almost never have to set math or computer code but I'll look into the  
command you describe in your message.  Thanks again.

> This is already incorporated into  utf8accents.sty , along with a lot  
> more
> commands from the various other font-encoding (.enc) files.
>
> Just simply making \renewcommand  definitions isn't really the right  
> approach.
> It's fine for a single document using just one kind of font.
> But if you are mixing different fonts (e.g. because of mathematics,
> computer code, multiple languages, etc.) then you may need a  
> high-level macro
> such as \textdollar to result in a different character depending upon  
> the font
> being used in the particular context.
> (Is it a tfm-based CM or Euler, or an AAT or OTF font ?)
>
>
> Thus you want the high-level definition to be done in such a way that
> the current \fontencoding  is taken into account.
>
> LaTeX provides commands for this:
>   \DeclareTextCommand   \DeclareTextSymbol   \DeclareTextAccent
> and 'Default' versions:
>   \DeclareTextCommandDefault   \DeclareTextSymbolDefault    
> \DeclareTextAccentDefault
> as well as
>   \DeclareTextComposite   and   \DeclareTextCompositeCommand
> and
>   \DeclareTextFontCommand  for defining font-switching macros.
>
>
> These are the commands that should be used, wherever possible.
> Alternatively study the innards of how these work, and mimic that.
>
> The latter is what is done in  utf8accents.sty  with its commands
>
>   \DeclareUTFcharacter
>     (for a Unicode version of \DeclareTextCharacter)
>
> and
>
>  \DeclareEncodedCompositeCharacter
>  \DeclareEncodedCompositeAccents
>
> for handling accents and other composite-pair constructions.
>
>
> Thus many issues of backwards-compatibility with existing (La)TeX
> practices are solved for XeTeX simply by loading  utf8accents.sty .
>
> As there have been quite a few requests for this lately,
> here it is again (in version v0.4).
>
> <utf8accents.sty>
>
>
>
> However  utf8accents.sty  doesn't solve the ligature problems,
> which are of a quite different character (sic).
> That's why the following is such great news ...
>
>>> However, we obviously cannot expect mainstream font vendors to add  
>>> support for TeX's unique keying conventions to their font tables.  
>>> Therefore, I have just implemented a "font mapping" scheme (this was  
>>> first suggested on the XeTeX list by Ross Moore, IIRC), which allows  
>>> an arbitrary mapping of Unicode character sequences to be associated  
>>> with a particular font. So having defined a mapping "tex-text" that  
>>> includes entries such as:
>>>
>>>     U+002D U+002D         >  U+2013 ; endash
>>>     U+002D U+002D U+002D  >  U+2014 ; emdash
>>>     U+0060 U+0060         >  U+201C ; opening double quote
>>>     ; etc....
>>>
>>> I can then load a font with a command like
>>>
>>>     \font\pal = "Palatino:mapping=tex-text" at 12pt
>>>
>>> and whenever this font is used, XeTeX will pass the Unicode  
>>> character sequence to be typeset (at the lowest level, after all  
>>> macro expansion, etc.) through this mapping, and the standard TeX  
>>> ligatures will work as expected.
>>>
>>> This was just implemented on Friday, and seems to be working well.  
>>> It will be present in the next release of XeTeX (along with that  
>>> OpenType ligature bug-fix, and perhaps another feature or two). Stay  
>>> tuned! :-)
>
>
> With this, and Will's new .fd  files, and  utf8accents.sty ,
> we should be very close to having full backward compatibility
> with legacy LaTeX documents.
>
> By this I mean that it should be possible to apply a new selection
> of (Macintosh) fonts to old LaTeX documents, just by making
> minimal changes to which packages are loaded in the preamble.
>
> I'd urge everyone to try this with some of your old documents,
> and report back to the list on special cases that are not being
> processed correctly.
>
>
>>>>  this sounds fantastic. Is this substitution scheme going to have a  
>>>> syntax permitting the use of character ranges and maybe even  
>>>> replacement patterns? So that one might be able to reorder  
>>>> character positions saying something like (assuming syntax  
>>>> resembling grep):
>>>>
>>>>  ([U+0915-U+0939]) (U+0930) > \2\1
>>>>
>>>>  I suppose one could spell out these substitutions for each case,  
>>>> but it would save time...
>>>
>>> Yes. For more on the mapping engine (primarily focused on  
>>> byte<->Unicode encoding conversion, but being used here to do  
>>> transformations of a Unicode text stream), see:
>>>
>>> 	http://scripts.sil.org/teckit
>>>
>>> The software currently there is primarily for Windows, but I'll post  
>>> OS X versions too.
>>>
>
>  ... and this aspect should open up a whole new ball-game
> for handling transliterations.
>
>
>
>
> All the best,
>
> 	Ross
>
>
>
>>>
>>> Jonathan
>>>
>>> _______________________________________________
>>> XeTeX mailing list
>>> postmaster at tug.org
>>> http://tug.org/mailman/listinfo/xetex
>>>
>>>
>> -- chris ciotti <chris_ciotti at yahoo.com>
>> http://www.keyserver.net/en/
>> Key ID: 0x0BD2B97A
>> _______________________________________________
>> XeTeX mailing list
>> postmaster at tug.org
>> http://tug.org/mailman/listinfo/xetex
>>
> ----------------------------------------------------------------------- 
> -
> Ross Moore                                         ross at maths.mq.edu.au
> Mathematics Department                             office: E7A-419
> Macquarie University                               tel: +61 +2 9850  
> 8955
> Sydney, Australia                                  fax: +61 +2 9850  
> 8114
> ----------------------------------------------------------------------- 
> -
>
> _______________________________________________
> XeTeX mailing list
> postmaster at tug.org
> http://tug.org/mailman/listinfo/xetex
>
-- 
chris ciotti <chris_ciotti at yahoo.com>
http://www.keyserver.net/en/
Key ID: 0x0BD2B97A