[XeTeX] search arabic text in pdf using adobe reader 7.0
François Charette
firmicus at ankabut.net
Wed Feb 6 10:00:47 CET 2008
sh a écrit :
> I am using MiKTeX-XeTeX 2.7.2904 (0.997 svn 539) (MiKTeX 2.7) in
> Microsoft Windows XP . I have been successful in creating pdf using
> arabxetex and Scheherazade opentype font.
>
> I am using adobe reader 7.0 to read the pdf. When I copy the arabic
> characters from the pdf, I get garbage characters when I paste it to
> MS Word (which is set to use the unicode Arial MT font). The
> individual characters copy just fine, but the characters that are in
> the intermediate form do not get copied.
>
> What do I need to do to able to copy the characters from the pdf. The
> pdf is encoded with identity-H/CID. I suspect I need to do something
> with Cmap or mapping?
>
This seems to be an issue (not only for copying but also for searching)
with the font Scheherazade, which also occurs when it is typeset with
plain xetex (and so is not related to your operating system or your PDF
viewer). In fact, only *isolated* characters can be correctly copied or
searched, the other characters come out, as you say, as "garbage"
(actually as characters with code-points above U+100000, in the
so-called "Supplementary Private Use Area B" of Unicode). I suppose
Jonathan should be able to tell us more about this...
In a PDF file with two identical Arabic paragraphs, one set in
Scheherazade (with heading in Lateef) and the second with Lotus Linotype
(a commercial font), copying and searching works without problem with
the former, but not with Scheherazade or Lateef. (Note that all three
fonts are encoded with Identy-H/CID). See the attachment, where the
first paragraph is Lateef+Scheherazade and the second Lotus Linotype.
I intend to test this with other Arabic fonts later.
FC
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ArabicOpenType.pdf
Type: application/pdf
Size: 55469 bytes
Desc: not available
Url : http://tug.org/pipermail/xetex/attachments/20080206/6cabeed9/attachment-0001.pdf
More information about the XeTeX
mailing list