[XeTeX] turn off special characters in PDF
Joe Corneli
holtzermann17 at gmail.com
Mon Dec 30 17:51:46 CET 2013
Thanks Ross.
I think in this case all I really need is to revise \href code to
insert /ActualText (because I'm using small caps for hyperlinks in
this doc). Pretty much everything else works fine already.
Joe
On Sun, Dec 29, 2013 at 11:45 PM, Ross Moore <ross.moore at mq.edu.au> wrote:
> Hi Joe,
>
> On 30/12/2013, at 8:12 AM, Joe Corneli wrote:
>
>> This answer talks about how to turn off litgatures:
>> http://tex.stackexchange.com/a/5419/4357
>>
>> Is there a way to turn off *all* special characters (e.g. small caps)
>> and just get ASCII characters in the copy-and-paste level of the PDF?
>
> In short, no!
> — because this is against the idea of making more use of Unicode,
> across all computing platforms.
>
> Certainly a ligature can have an /ActualText replacement consisting
> of the separate characters, but this requires the PDF producer
> to have supplied this within the PDF, as it is being generated.
>
> I've played a lot with this kind of thing, and think that this
> is the wrong approach. One should use /ActualText to provide
> the correct Unicode replacement, when one exists. Thus one
> can extract textual information reliably, even when the PDF
> uses legacy fonts that may not contain a /ToUnicode resource,
> or if that resource is inadequate in special situations.
>
>
> Besides, do you really mean *all* special characters?
> What about simple symbols like: ß∑∂√∫Ω and all the other
> myriad foreign/accented letters and mathematical symbols?
>
> If you want these to Copy/Paste as TeX coding (\beta \Sum \delta
> \sqrt etc.) within documents that you write yourself, then I wrote
> a package called mmap where this is an option for the original
> Computer Modern fonts.
>
>
> Alternatively, a PDF reader might supply a filtering mode that
> converts the ligatures back to separate characters. Then the
> user ought to be able to choose whether or not to use this filter.
> I don't know of any that actually do this.
> (In any case, you would want such a tool to allow you to specify
> which characters to replace, and which to preserve.)
>
>
> Your best option is surely to (get someone else to) write such
> a filter that meets your needs, and use it to post-process the text
> extracted via Copy/Paste or with other text-extraction tools.
>
> Of course this is no use if your aim is to create documents for
> which others get the desired result via Copy/Paste.
> For this, the /ActualText approach is what you need.
>
>
>
> Hope this helps,
>
> Ross
>
> ------------------------------------------------------------------------
> Ross Moore ross.moore at mq.edu.au
> Mathematics Department office: E7A-206
> Macquarie University tel: +61 (0)2 9850 8955
> Sydney, Australia 2109 fax: +61 (0)2 9850 8114
> ------------------------------------------------------------------------
>
>
>
>
>
>
> --------------------------------------------------
> Subscriptions, Archive, and List information, etc.:
> http://tug.org/mailman/listinfo/xetex
>
More information about the XeTeX
mailing list