[XeTeX] Assignment of codes (particularly \catcode) based on Unicode data
Joseph Wright
joseph.wright at morningstar2.co.uk
Wed May 6 22:15:07 CEST 2015
On 06/05/2015 15:09, Jonathan Kew wrote:
> On 6/5/15 14:14, Joseph Wright wrote:
>
>> Based on the current files, we have a block to set \XeTeXcharclass,
>> which only applies to XeTeX. The logic followed in that code is that
>> characters in the file LineBreak.txt which have class "ID" (ideographs)
>> not only set the \XeTeXcharclass class to 1 but also set the \catcode of
>> the code point to 11. That leads to a difference between the two Unicode
>> engines. My current feeling is that the data file should split this
>> process such that the category code change applies to both XeTeX and
>> LuaTeX, with the XeTeX-specific code separate. Does this make sense and
>> indeed does the current assignment make sense?
>>
>
> ISTM that the most appropriate (default) \catcode for characters with
> class ID is clearly letter (11), and would suggest that LuaTeX should
> follow XeTeX in this.
Well for LaTeX at least the team get to make the call here and I think
we will pull everything into line.
> So yes, splitting out the XeTeX-specific code and having LuaTeX share
> the catcode assignments makes sense.
OK, if there are no objections I have a plan on this (I'll actually keep
all of the data, I think, and alter the assignment code).
> After all, if users can write control sequences such as
>
> \hello
> \halló
> \Здравствуйте
> \ሰላም
> \सलाम
>
> they should equally well be able to write
>
> \你好
> \こんにちわ
>
> and have each of these treated as single control sequences, too. This
> will not work if category ID characters are given catcode 12.
Entirely reasonable.
> If you're making improvements to unicode-letters.def, I would suggest
> also adding a section that assigns catcode 15 (invalid) to the code
> values "D800 - "DFFF (i.e. the UTF-16 surrogates, which should never be
> used in isolation as characters).
Noted: easy enough to add.
--
Joseph Wright
More information about the XeTeX
mailing list