[XeTeX] xdvipdfmx, line breaks and hyphenated words
Pablo Rodríguez
oinos at web.de
Sun Jan 14 14:06:50 CET 2007
William Adams wrote:
> On Jan 12, 2007, at 2:02 PM, Pablo Rodríguez wrote:
>
>> one of the things that I think it would be interesting to implement in
>> xdvipdfmx is one feature that Adobe generated documents (such as
>> http://pdf.codev2.cc/Lessig-Codev2.pdf) have: that the searchable text
>> contains no line breaks (within the same paragraphs) and hyphenated
>> works aren't hyphenated in the text within.
>
> I'm not fully understanding what you're saying here.
Thanks, William, for your answer. Sorry, but I have expressed myself
wrong. And I'm afraid that I chose the wrong example. The right one is
http://www.free-culture.cc/freeculture.pdf.
> There's a hyphen ``transla-
> tion'' on the first line of the book (pg.ix).
Let's take “soft-ware” on page xiii. Acrobat is not able to find a
hyphen there. And it copies an unhyphenated word.
> If you mean that Acrobat allows a search for ``translation'' to find
> ``transla-
> tion'', well that works for TeX document too --- Adobe simply chose
> to ignore / stitch together parts of words on different lines
> separated by a hyphen.
My experience is that Acrobat does not find “transla-tion” searching
from “translation” on page ix in Lessig-Codev2.pdf. And it finds
“soft-ware” on page xiii in freeculture.pdf. I think that Acrobat finds
it, because of the way the PDF document is generated (and not because of
a general feature that the one you described).
I have uncompressed freeculture.pdf and edited it with vim, but I was
not able to find the reason why Acrobat is able to find “soft-ware” when
searching for “software”. No surprise, since I have no knowledge of the
PDF specification.
Thanks for your help,
Pablo
More information about the XeTeX
mailing list