[XeTeX] Ligatures and searching in PDFs
Joel C. Salomon
joelcsalomon at gmail.com
Sun May 16 21:02:17 CEST 2010
On 05/10/2010 03:36 AM, Janusz S. Bień wrote:
> On Mon, 10 May 2010 Paul Foley <paul at mises.com> wrote:
>> Try the following:
>>
>> \documentclass{article}
>> \usepackage{xltxtra}
>> \setmainfont[Mapping=tex-text,Numbers=OldStyle,Ligatures={Required,Common,Rare}]{Junicode}
>>
>> \begin{document}
>> Fifty afflicted fjords.
>> \end{document}
>>
>> Load the PDF, and search for any of the words.
>>
>> The "fty", "ct" and "fj" ligatures aren't in Unicode, and the private-use
>> characters obviously can't be decomposed by the PDF viewer. The same
>> problem will obviously occur for variant letter shapes, old-style digits,
>> etc.
>
> The proper solution would be to use /ActualText feature of the PDF
> specification.
IIRC, the proper solution is for the font to have an OpenType table that
links arbitrary ligature glyphs to the character string they represent
(ligature decomposition). If the (e.g.) “fty” ligature has been
(improperly) encoded in the Unicode PUA that will make this solution harder.
—Joel Salomon
More information about the XeTeX
mailing list