[XeTeX] Localised [Xe][La]TeX (was : Localized XeLaTeX (was : Greek XeLaTeX))

Fri Oct 15 11:53:28 CEST 2010

Keith J. Schultz wrote:

> 	Like I saiud the best point to confront the problem is in the parser at
> 	a low level directly in the xetex engine. so that the "normal" is distinguished
> 	from the markup.
>
> 	There seems to be a consensus that it would be a good idea to have the
> 	markup localized. The idea seems workable.
>
> 	But, the question remains is someone willing to do the work on the xetex engine.
> 	Or is there anybody interrested in doing so.

Much as I am in favour of any proposal which will allow more
people to use TeX (or any other system) through the medium
of their own first language, I think that there are many Many
MANY problem areas to be resolved before we start discussing
who might do the work.  And it is not even clear to me that
implementing this at the level of the parser is necessarily
the optimal solution, since unless there were a matching
"unparser" one could still not obtain (say) the American
English equivalent of a document marked up in (say) Sinhala,
as a result of which only those familiar with Sinhala could
help another Sinhala speaker with his/her problems.  So
let me outline what I see as some of the more difficult
problems and ask if solutions to these are obvious to others.

1) The TeX language consists in part of control words,
control symbols and keywords, together with a small
number of characters that are "reserved" in some sense
as a result of their having a particular category code.
And only during the processing of a TeX document is it
possible to know with absolute certainty what role any
particular character or sequence of characters is playing,
since TeX allows (almost) /everything/ to change its
meaning on the fly.  Thus the only program that could,
with 100% certainty, convert a document marked up in
Language A to one marked up in Language B would be TeX
itself.  But as TeX was not written with this functionality
in mind, it would have to be retrofitted to the TeX
source itself : a distinctly non-trivial task.

Comment : (1) deals solely with monolithic documents :
those that make no reference to anything other than
themselves.

2) In real life, monolithic documents have virtually disappeared.
Even the simplest letter that I wrote, for example,
will \input A4-Letter, which will itself \input A4 and
\input Letter; and most things that I write will \input
many files rather than just one.  And as a Plain TeX
user, I am one of a tiny, vanishing, minority : most
will be using LaTeX and some will be using Context.
In both cases, there will be an automatic requirement
that other files will be \input.  Consider the following :

	\documentclass {minimal}
	\begin {document}
	\end {document}

and then consider its log file, which reads (in part)

	(e:/TeX/Live/2010/texmf-dist/tex/latex/base/minimal.cls
		Document Class: minimal 2001/05/25 Standard LaTeX minimal class
	)

Note that even this most trivial of documents requires at least
one adjunct file : "minimal.cls", in this case.

Now "minimal.cls" is written using standard LaTeX markup, which
is based on American English; thus at the point of \inputting
this file, a "Universal" TeX processor would have to detect that
a new file was being processed, ascertain the language in which
it was marked up (and no such files currently carry any "Markup-
language" pragmat within them to indicate the markup language),
process this file under a different language régime, and then
revert to the original language régime once the file being
\input had ended.

Comment : (2) deals with the class of documents that process
the whole of another document before returning to continue
to process themselves.

3) But TeX is not restricted to \inputting files; it can
also open files for reading, and read them a line at a
time.  It may then elect to process those lines as if
they were TeX source, in which case each file opened would
have to carry markup-language information, and the processing
system would have to switch markup language before and after
processing each line.

Comment : (3) deals with the class of documents that process
other documents on a line-by-line basis.

4) Of course, this is just touching the tip of the iceberg,
We cannot know, a priori, whether a command \foo, embedded
in a file "bar.tex", is referencing \foo from a document
marked up in American English, French, or any other language
that uses the Unicode characters "f" and "o".  The real
complexities are absolutely horrific, and some very serious
research and investigation would need to be carried out before
this project could move from a "Wouldn't it be nice" to a "This
is feasible and will take $n$ man-millenia to complete" state.

This is not to suggest that we shouldn't start.  But nor
should we underestimate the magnitude of the task that
we are setting ourselves.

Philip Taylor