[XeTeX] Ligatures and searching in PDFs
Janusz S. Bień
jsbien at mimuw.edu.pl
Mon May 10 09:36:33 CEST 2010
On Mon, 10 May 2010 Paul Foley <paul at mises.com> wrote:
> 1. (*) text/plain ( ) text/html
>
> Try the following:
>
> \documentclass{article}
> \usepackage{xltxtra}
> \setmainfont[Mapping=tex-text,Numbers=OldStyle,Ligatures={Required,Common,Rare}]{Junicode}
>
> \begin{document}
> Fifty afflicted fjords.
> \end{document}
>
> Load the PDF, and search for any of the words.
>
> The "fty", "ct" and "fj" ligatures aren't in Unicode, and the private-use
> characters obviously can't be decomposed by the PDF viewer. The same
> problem will obviously occur for variant letter shapes, old-style digits,
> etc.
>
> But scanned documents in PDF often have an invisible text layer attached
> which can be searched, etc.; is it possible to use the same technique to put
> the decomposed letters over the visible private-use characters, so that
> documents remain searchable (and copy/paste-able)?
The proper solution would be to use /ActualText feature of the PDF
specification.
Best regards
Janusz
--
,
dr hab. Janusz S. Bien, prof. UW - Uniwersytet Warszawski (Katedra Lingwistyki Formalnej)
Prof. Janusz S. Bien - Warsaw University (Department of Formal Linguistics)
jsbien at uw.edu.pl, jsbien at mimuw.edu.pl, http://fleksem.klf.uw.edu.pl/~jsbien/
More information about the XeTeX
mailing list