[XeTeX] Math class initialization in Unicde-aware engine

Thu Nov 28 00:27:57 CET 2019

On 27/11/2019 23:20, Doug McKenna wrote:
> Another question about Unicode-aware TeX engine (e.g., XeTeX) initialization files.
> 
> The Unicode Consortium provides a file, MathClass.txt, e.g.,
> 
> ./texmf-dist/tex/generic/unicode-data/MathClass.txt
> 
> It contains a list of lines (and comments).  Field 0 of an entry line is a Unicode code point or a range of code points, and field 1 is a single ASCII character that declares the Unicode math class to which the code point or range of code points belongs.
> 
> Comments in that file say that there are (currently) 15 different Unicode math class codes:
> 
> #   N - Normal - includes all digits and symbols requiring only one form
> #   A - Alphabetic
> #   B - Binary
> #   C - Closing - usually paired with opening delimiter
> #   D - Diacritic
> #   F - Fence - unpaired delimiter (often used as opening or closing)
> #   G - Glyph_Part - piece of large operator
> #   L - Large - n-ary or large operator, often takes limits
> #   O - Opening - usually paired with closing delimiter
> #   P - Punctuation
> #   R - Relation - includes arrows
> #   S - Space
> #   U - Unary - operators that are only unary
> #   V - Vary - operators that can be unary or binary depending on context
> #   X - Special - characters not covered by other classes
> 
> During XeTeX format initialization, the file load-unicode-math-classes.tex in that same directory is executed, in order to declare to the engine which Unicode code points belong to which TeX math classes.  The comments in that file say that the classes it pays attention to are those with the following Unicode math codes:
> 
> % This file parses MathClass.txt, provided by the Unicode Consortium, and sets
> % up the following mapping between Unicode classes and TeX math types
> % - "L" (large)       \mathop
> % - "B" (binary)      \mathbin
> % - "V" (vary)        \mathbin
> % - "R" (relation)    \mathrel
> % - "O" (opening)     \mathopen
> % - "C" (closing)     \mathclose
> % - "P" (punctuation) \mathpunct
> % - "A" (alphabetic)  \mathalpha
> 
> That means that there are 7 other Unicode math classes that are unaccounted for.
> 
> Unfortunately, the documentation/comments don't say what happens to entries having these other Unicode math codes (N, D, F, G, S, U, and X).  Are they completely ignored, or are they mapped to one of the other eight codes that matches what TeX is interested in or only capable of handing?
> 
> I can imagine that the space character, given Unicode math class 'S' in MathClass.txt, is ignored during this parse.  But what happens to the '¬' character (U+00AC) ("NOT SIGN"), which is assigned 'U' (Unary Operator).  Surely the logical not sign is not being ignored during initialization of a Unicode-aware engine, yet the comments in load-unicode-math-classes.tex don't say one way or the other, and it appears to me that the parsing code is ignoring it.
> 
> The ReadMe.md file
> 
> <https://ctan.org/tex-archive/macros/generic/unicode-data>
> 
> is also deficient in answering this question.
> 
> TIA,

Er, I thought the README was reasonably clear, ah well!

The other Unicode math classes don't really map directly to TeX ones, so 
they are currently ignored. Suggestions for improvements here are of 
course welcome.

Joseph