[tex-k] tex-k Digest, Vol 189, Issue 11
Doug McKenna
doug at mathemaesthetics.com
Sat Nov 21 17:28:19 CET 2020
Wolfgang Helbig wrote:
>| This rules out UTF-8, which is ASCII for
>| characters 0..127 and 16 bit codes above 127.
The second part of this statement is incorrect. UTF-8 is a variable-length encoding that converts any 21-bit Unicode code point into a 1-, 2-, 3-, or 4-byte sequence. If the high-bit of the first byte in the sequence is not set, then it's a 1-byte "sequence" representing the 7 bits of ASCII, from 0 to 127. Otherwise, in UTF-8 a character (code point) is 2, 3, or 4 bytes long, depending on where the Unicode code point lies in the full range (ignoring grapheme clusters).
Doug McKenna
More information about the tex-k
mailing list.