[XeTeX] turn off special characters in PDF
Ross Moore
ross.moore at mq.edu.au
Mon Dec 30 00:45:39 CET 2013
Hi Joe,
On 30/12/2013, at 8:12 AM, Joe Corneli wrote:
> This answer talks about how to turn off litgatures:
> http://tex.stackexchange.com/a/5419/4357
>
> Is there a way to turn off *all* special characters (e.g. small caps)
> and just get ASCII characters in the copy-and-paste level of the PDF?
In short, no!
— because this is against the idea of making more use of Unicode,
across all computing platforms.
Certainly a ligature can have an /ActualText replacement consisting
of the separate characters, but this requires the PDF producer
to have supplied this within the PDF, as it is being generated.
I've played a lot with this kind of thing, and think that this
is the wrong approach. One should use /ActualText to provide
the correct Unicode replacement, when one exists. Thus one
can extract textual information reliably, even when the PDF
uses legacy fonts that may not contain a /ToUnicode resource,
or if that resource is inadequate in special situations.
Besides, do you really mean *all* special characters?
What about simple symbols like: ß∑∂√∫Ω and all the other
myriad foreign/accented letters and mathematical symbols?
If you want these to Copy/Paste as TeX coding (\beta \Sum \delta
\sqrt etc.) within documents that you write yourself, then I wrote
a package called mmap where this is an option for the original
Computer Modern fonts.
Alternatively, a PDF reader might supply a filtering mode that
converts the ligatures back to separate characters. Then the
user ought to be able to choose whether or not to use this filter.
I don't know of any that actually do this.
(In any case, you would want such a tool to allow you to specify
which characters to replace, and which to preserve.)
Your best option is surely to (get someone else to) write such
a filter that meets your needs, and use it to post-process the text
extracted via Copy/Paste or with other text-extraction tools.
Of course this is no use if your aim is to create documents for
which others get the desired result via Copy/Paste.
For this, the /ActualText approach is what you need.
Hope this helps,
Ross
------------------------------------------------------------------------
Ross Moore ross.moore at mq.edu.au
Mathematics Department office: E7A-206
Macquarie University tel: +61 (0)2 9850 8955
Sydney, Australia 2109 fax: +61 (0)2 9850 8114
------------------------------------------------------------------------
-------------- next part --------------
A non-text attachment was scrubbed...
Name: logo.png
Type: image/png
Size: 5257 bytes
Desc: not available
URL: <http://tug.org/pipermail/xetex/attachments/20131230/1cd53845/attachment.png>
-------------- next part --------------
More information about the XeTeX
mailing list