[XeTeX] xdvipdfmx, line breaks and hyphenated words
William Adams
will.adams at frycomm.com
Fri Jan 12 21:02:32 CET 2007
On Jan 12, 2007, at 2:02 PM, Pablo Rodríguez wrote:
> one of the things that I think it would be interesting to implement in
> xdvipdfmx is one feature that Adobe generated documents (such as
> http://pdf.codev2.cc/Lessig-Codev2.pdf) have: that the searchable text
> contains no line breaks (within the same paragraphs) and hyphenated
> works aren't hyphenated in the text within.
I'm not fully understanding what you're saying here.
There's a hyphen ``transla-
tion'' on the first line of the book (pg.ix).
The book was created w/ Quark XPress v7, so the not hyphenating
hyphenated words was probably done by manually inserting a
discretionary hyphen at the beginning of such compounds (that's how
QXP 6.5 and earlier has done it --- see a recent post on this to
comp.text.tex by yours truly about having to do it by hand).
AFAIK TeX won't hyphenate a word which contains a hyphen. I'm not
sure if this is changed in LaTeX or no. If it's not, easy enough to
introduce a ``\allowbreak'' at need. You could do a variation of what
I do, searching for all instances of ``-'' and replacing those which
warrant it w/ ``-\allowbreak '' or ``-\allowbreak%
''.
If you mean that Acrobat allows a search for ``translation'' to find
``transla-
tion'', well that works for TeX document too --- Adobe simply chose
to ignore / stitch together parts of words on different lines
separated by a hyphen.
Try it:
\documentclass{minimal}
\begin{document}
\noindent Transla-\\
tion
\end{document}
This has some un-intended consequences though, consider that ``compound-
interest'' will be found, even when one is searching for
``compoundinterest'' (If I could I'd think of a good example word
pair where that would make a difference).
William
--
William Adams
senior graphic designer
Fry Communications
This email message and any files transmitted with it contain information
which is confidential and intended only for the addressee(s). If you are
not the intended recipient(s), any usage, dissemination, disclosure, or
action taken in reliance on it is prohibited. The reliability of this
method of communication cannot be guaranteed. Email can be intercepted,
corrupted, delayed, incompletely transmitted, virus-laden, or otherwise
affected during transmission. Reasonable steps have been taken to reduce
the risk of viruses, but we cannot accept liability for damage sustained
as a result of this message. If you have received this message in error,
please immediately delete it and all copies of it and notify the sender.
More information about the XeTeX
mailing list