[tex4ht] curiosity about unicode.4hf
Matteo Gamboz
gamboz at medialab.sissa.it
Mon Mar 13 14:10:22 CET 2017
Hi all
this is a bit similar to
http://tex.stackexchange.com/questions/328441/tex4ht-unicode-representations-of-apostrophe-in-utf-8-html-source
(please feel free to tell me to post on tex.stackexchange)
I have a curiosity about a unicode entity.
Here is the situation: when I take a tex file such as the following
cat > a.tex <<EOF;
\documentclass{article}
\begin{document}
'
\end{document}
EOF
an run it through
htlatex a "xhtml" " -cunihtf -utf8"
I get "a.html" that contains:
...’...
(where "ߣ" is the unicode node of "’")
This is because of the file
/usr/local/texlive/2016/texmf-dist/tex4ht/ht-fonts/unicode/charset/unicode.4hf
that contains lines to keep the following in unicode representations:
< <
> >
" "
’ ’
& &
AFAIK, ' and " are illegal in attributes, but ’ and ‘ (#x2018) should
not be (and #x2018 is not in the file - texlive2016).
Does anyone know why &x2019; ended up in unicode.4hf?
Thanks all
m
More information about the tex4ht
mailing list