[XeTeX] [tex-hyphen] Hyphenation of polytonic Greek (expressed in Unicode)
Mike Maxwell
maxwell at umiacs.umd.edu
Fri Sep 13 01:20:30 CEST 2013
On 9/12/2013 6:17 PM, Khaled Hosny wrote:
> Some writing systems do not use spaces to separate words, so TeX’s
> normal line breaking algorithm will fail. \XeTeXlinebreaklocale
> instructs XeTeX to break the lines based on the rule of those writing
> systems.
>
> ‹Locale ID› should be the ISO code of the language in question,
Hmm, wouldn't this be insufficient information? Some languages are
written in multiple scripts, and I would not be surprised if word breaks
are signaled differently in those different scripts. Japanese, for example?
> documentation is a bit vague, but it seems to calculate the line
> breaking position based on the Unicode character properties and the
> locale value is simply ignored).
That also seems insufficient, since multiple languages may use the same
script and have different word (and therefore line) breaking
characteristics. Although perhaps closer, given that scripts that don't
use spaces are *perhaps* more unique to a particular language, or to a
small set of similar languages--e.g. Chinese script, to the extent that
Cantonese and Mandarin are similar in their word break characteristics.
But here I'm *really* ignorant.
In general, word breaking in scripts that don't indicate word boundaries
is a partly unsolved research problem in computational linguistics--and
from what I've heard, native speakers often disagree. (If you think
that's odd, you might consider 'doghouse' vs. 'dog house' in English...)
So I suppose it's not surprising if this doesn't work as well in XeTeX
as one might hope.
--
Mike Maxwell
"The biggest danger is not ignorance,
but the illusion of knowledge."
--Stephen Hawking
More information about the XeTeX
mailing list