[texhax] Low-level TeX question: string substitution macro

Toby Cubitt tsc25 at cantab.net
Thu May 31 20:22:27 CEST 2007


Thanks to some very helpful comments from Barbara Beeton (off-list) and 
to the on-list replies, I've now more or less got this working.

Since catcodes are fixed when the characters are first read (apart from 
special commands like \string and \meaning), it seems there's no way to 
do what I want directly. So instead, I first write the unescaped text to 
a temporary file, then modify the appropriate catcodes and re-read this 
temporary file, writing it out again to the final destination file. The 
modified catcodes are in effect when the file is re-read, so the 
characters get expanded to their escaped form when they're re-written.

The only thing holding me back from dispensing with the temporary file 
is that I can't figure out how to write a newline character to an 
external file. None of the following seem to work:

\write\@stream{^^M}
\write\@stream{\\}
{\lccode`|=13 \lowercase{\write\@stream{|}}}

Is there some way to write out an explicit newline? Please don't just 
tell me I could do it by writing the file one line at a time. That's 
what I'm doing at the moment, but it requires the temporary file. I have 
to loop through the temporary file, reading a line from it and 
immediately re-writing it (with escapes expanded) to the final file. I 
could store the text to be written in a macro that gets added to each 
iteration, and write it all out to file at the very end. But then I need 
to insert the newlines manually into the macro so that they appear in 
the file when it's written out, hence my question.


In answer to Donald Arseneau's comments: I realise TeX's file 
input/output features aren't designed for dealing with anything other 
than files containing TeX source. But the file I'm writing *is* mostly 
TeX code. The sed script contains rules for replacing one sequence of 
LaTeX commands with another. The LaTeX commands to be replaced aren't 
known until the LaTeX source file is processed, so I *have* to write out 
at least some of the information from within TeX. Given that I have to 
write something from TeX, I might as well write the entire sed script 
from TeX if I can.

Finally, in reply to Michael Doob: I now think that writing a Perl 
script instead of sed would only make things slightly simpler. I would 
still need to escape the "\" character inside Perl strings when writing 
the script file from TeX, and we're back to my original problem :) By 
the way, awk can also be made to escape special characters in a string 
prior to using it as a computed regexp, though not in quite so simple a 
way as Perl. But I seem to have it working with sed now, anyway.

Thanks for everyone's help, and I hope someone can shed similar light on 
my final dilemma.

Toby


Toby Cubitt wrote:
> I'm trying to write an internal macro that does string substitution, in
> order to escape certain characters in the string before writing it to a
> file. (The package is supposed to be writing a sed script, so I need to
> escape characters that have a special meaning in regular expressions.)
> 
> If this was a user-level macro to be used in the LaTeX source itself, I
> think can see how it could be done, by changing the catcodes of the
> characters to be escaped to 13 (active character), then defining these
> active characters to expand to escaped versions of themselves. (I
> suppose this would be somewhat akin to LaTeX's \verb command). The
> trouble is, this macro is to be used in a LaTeX package, and I need
> something like the following to work:
> 
> 
> \begingroup%
> \catcode`|=0
> |catcode`.=13 |catcode`[=13 |catcode`]=13
> |catcode`^=13 |catcode`$=13 %$
> \catcode`\\=13
> |gdef|@escapechars#1{%
>    |begingroup
>    |catcode`|=0
>    |catcode`.=13 |catcode`[=13 |catcode`]=13
>    |catcode`^=13 |catcode`$=13 %$
>    |catcode`\=13
>    |def\{|string\|string\}%
>    |def^{|string\|string^}%
>    |def${|string\|string$}%
>    |def.{|string\|string.}%
>    |def[{|string\|string[}%
>    |def]{|string\|string]}%
>    #1|endgroup%
> }
> |endgroup%
> \def\@tmpa{\foobar}
> \expandafter\@escapechars\expandafter{\@tmpa}%
> 
> 
> It seems I need those \catcode changes outside the macro definition, as
> well as inside, otherwise the |endgroup and |catcode changes inside the
> macro aren't recognized properly, though I don't entirely understand the
> reason behind this. In reality, the \@tmpa macro is of course defined by 
> a much more complicated process than a simple \def (otherwise the whole 
> exercise becomes trivial!), but the above serves to illustrate the scenario.
> 
> This code is supposed to change the "\foobar" into "\\foobar", but 
> instead it fails with an "Undefined control sequence \foobar" error. If 
> I understand this correctly (unlikely!), the problem is that the "\" in 
> "\foobar" already has catcode 0 (escape character) before it's absorbed 
> by \@escapechars, so TeX expands #1 into "\foobar" with the catcodes 
> already assigned, the catcode changes inside the \@escapechars macro 
> have no effect, and TeX tries to interpret "\foobar" as a command 
> sequence. Is this at all correct?
> 
> Is there any way to do what I want? If my above analysis is correct,
> what I guess I need is a command to change the catcodes of tokens, but
> TeX's abilities in this respect seem to be limited. The \string and
> \meaning commands can only change tokens to catcode 12 (letter), and the
> \lowercase command changes charcodes rather than catcodes. Maybe there's 
> a completely different way of achieving what I want?
> 
> I've tried to reduce this question to its bare essentials, but if it's
> not clear what I'm trying to do, I can go into more detail.
> 
> Thanks very much,
> 
> Toby
> 
> _______________________________________________
> TeX FAQ: http://www.tex.ac.uk/faq
> Mailing list archives: http://tug.org/pipermail/texhax/
> More links: http://tug.org/begin.html
> 
> Automated subscription management: http://tug.org/mailman/listinfo/texhax
> Human mailing list managers: postmaster at tug.org


More information about the texhax mailing list