[texhax] Can I Parsing with TeX ?
Toby Cubitt
tsc25 at cantab.net
Wed Aug 15 13:54:02 CEST 2007
wa2n wrote:
> Hi all
> Can TeX doing parsing or regexp ?
>
> or the question is
> how can I replace something in tex source (.tex) while I'm compiling the
> source (make it .dvi) ?
The short answer is: not really. The long answer is: perhaps, depending
on what exactly you want to do, but coding it will probably involve a
lot of pain. You're almost certainly better off using an external
utility designed for this task, such as sed, awk, or perl.
However, if you really, really can't avoid doing it in TeX, here's my
understanding of why it's so difficult (others will probably be able to
correct/improve on this). TeX is a macro language, which means it works
simply by expanding macros, expanding the results of that expansion,
expanding the results of that, and so on until there's nothing left to
expand (i.e. the expression contains only primitive or unexpandable
tokens). When TeX reads a file, it reads the file into its input stream,
parses the characters in the input stream, converting them into a
sequence of tokens according to the current category codes ("catcodes"),
then expands these tokens as necessary. There's a better description of
this whole process in the "TeX-by-Topic" book (available online).
To get TeX to replace one string (that matches a regexp, say) with
another, you would have to make that string expandable in some way. This
would probably involve changing the catcodes of those characters, since
normal letters are otherwise parsed as single, unexpandable tokens.
However, the catcodes take effect when the input stream is parsed, so
you'd have to change the catcodes before the string you intend to
replace gets parsed. If you wanted to do general string replacement on
an input file, you'd essentially have to turn TeX into a
string-replacement machine by redefining catcodes and defining the
necessary macros right at the start, before the file gets read into the
input stream. This might be possible in principle, but it will
undoubtedly be difficult. Then you'll need to undo all this messing
around with TeX internals, and feed everything back into TeX to have it
process the string-replaced file normally. However, the catcodes were
already fixed when the file was read the first time. The only way to
fully "reset" the catcodes is to write the string-replaced file to disk,
revert the catcodes, then re-read the file again.
I expect you'd rather not have to code all this :)
If you want to see an example of this kind of thing in action, you could
have a look at the code that implements the "poorman" option in my
"cleveref" package (on CTAN). I used a very simplified form of the above
to do very, very simple string replacement: replacing certain single
characters in a file with "escaped" versions of those characters. As
you'll see if you look at the code, it works by changing the catcodes of
the relevant characters to turn them into active characters (single
characters that TeX expands into something else), reading the file into
the TeX input stream and processing it, thereby replacing the active
characters with their expansions, and writing the result back out to file.
Hope that helps. I'm still a relative novice at TeX programming, so some
of what I said might be inaccurate, but TeXperts on this list should be
able to correct it.
Toby Cubitt
More information about the texhax
mailing list