[texhax] An expected yet very surprising behavior of TeX.

Paul Isambert zappathustra at free.fr
Thu Oct 17 13:47:47 CEST 2013


Hello all,

I’ve encountered a behavior in TeX that I find very puzzling; it is not a
bug per se since it follows TeX rules (plus, of course, TeX has no bug :) ),
but I definitely can’t make sense of it.

It involves insertions and the pagebreaking algorithm, so I’ll describe them
here summarily to outline the relevant points (the description is extremely
simplified and omits many details).

When TeX considers breaking a page at a legal breakpoint, it calculates the
cost of breaking there as follows (see TeXbook, p.111):

1. cost = penalty at that point if it is <= -10000;
2. cost = badness of the page if broken there + penalty + \inserpenalties
   otherwise.

In case 1, or if TeX encounters a breakpoint that would produce an overfull
page, TeX breaks the page at the best remembered breakpoint.

If there is an insertion, and if that insertion can’t fit on the page and must
be split (because it would take too much of its allocated space), then the
penalty at which it is split is recorded in \insertpenalties mentionned in
case 2.

Now the strange behavior is that breaking might be triggered by case 1 (a very
strong negative penalty), yet performed at an earlier breakpoint because
\insertpenalties is very strong too.

As an example:

    % EXAMPLE
    \tracingpages=1 % So we’ll see the page being built.

    % New insertion class; the third line means it is only allowed to occupy
    % one \baselineskip on the page (so it’ll be split). 
    \newinsert\myins
    \count\myins=1000
    \dimen\myins=\baselineskip

    % Basic, uninteresting output.
    \output{%
      \shipout\vbox{%
        \ifvoid\myins
        \else
          \box\myins
          \hrule
        \fi
        \box255}%
      }

    % An insertion: it will be split after the first line.
    \insert\myins\bgroup
    ins 1\par
    \penalty-11000
    ins 2\par
    \egroup

    line 1
    \vfil
    \penalty0

    line2
    \vfill
    \penalty-10000

    \bye
    % END OF EXAMPLE

Now, you’d think the page will be broken at the last penalty, and line 1 and
line 2 will end up at the top of the page, because of \vfill; but that is not
what happen: the break occurs at the \vfill because there the cost is

    badness (=0) + penalty (=0) + \insertpenalties (=-11000) = -11000

and that is stronger than the big penalty; since the \vfill is discarded,
line 2 is flushed to the bottom of the page because of the preceding \vfil.
Note that the break is still triggered by \penalty-10000, but it is not the
best breakpoint.

If we look at what \tracingpages returns, we have:

%% goal height=643.20255, max depth=4.0
TeX splits the insert and sets \insertpenalties to -11000:
%split252 to 12.0,6.67859 p=-11000
% t=0.0 g=636.52396 b=10000 p=0 c=100000#
% t=10.0 g=636.52396 b=10000 p=0 c=100000#
% t=10.0 plus 1.0fil g=636.52396 b=0 p=0 c=-11000#
TeX considers breaking at the \vfill (and discarding it), and that is indeed
the best breakpoint, but it doesn’t trigger breaking yet:
% t=22.0 plus 1.0 plus 1.0fil g=636.52396 b=0 p=0 c=-11000#
TeX considers breaking at \penalty-10000, which triggers the break but is not
the best breakpoint:
% t=22.0 plus 1.0 plus 2.0fil g=636.52396 b=0 p=-10000 c=-10000

What I find very strange seems obvious to me, but I’ll state it again
nonetheless: if there were no split insertion, breaking at the penalty would
be better than breaking at the \vfill; the split insertion reverses that,
although it is (in this case) totally irrelevant to what happens in the main
text.

The obvious modification to the algorithm would be turning case 1 from

    cost = penalty at that point if it is <= -10000;

to
    cost = penalty + \insertpenalties at that point if it is <= -10000;

so that \insertpenalties is taken into account (it is important), but adds its
weight equally to all breakpoints, and doesn’t turn what we’re all used to see
as forced breakpoints into mere possible ones.

But perhaps there is some good reason why TeX works as it does, and somebody
here among our best wizards can enlighten me...?

(Note that the problem can be easily circumvented; yet I believe it should not
exist at all!)

Best,
Paul



More information about the texhax mailing list