[XeTeX] [tex-hyphen] Help with UTF-8 Language
Reinhard Kotucha
reinhard.kotucha at web.de
Sun Oct 12 01:51:31 CEST 2014
On 2014-10-10 at 07:51:53 +0200, Werner LEMBERG wrote:
> Unfortunately I don't have time to write a Perl or Python script for
> you, but it should be straightforward to program a small filter that
>
> (a) converts from UTF-8 to UTF-16
> (b) converts from UTF-16 to the ad-hoc 8bit encoding by stripping
> off the higher byte
Hi Werner, you don't need Perl for (a).
iconv -f UTF-8 -t UTF-16BE -o <outfile> <infile>
or, more verbose,
iconv --from-code=UTF-8 --to-code=UTF-16BE --output=<outfile> <infile>
Converting from UTF-18 to UTF-8 is easier
iconv -f UTF-16 -t UTF-8 -o <outfile> <infile>
because the byte order is determined by the BOM.
(b) is more difficult due to the endianmess. Whether you have to
strip the lower or the higher byte depends on whether you converted to
UTF-16LE or UTF-16BE.
iconv is ubiquitous on Linux and maybe on other Unix systems too.
However, a few months ago I created binaries for Windows using the MXE
cross compiler. Extract
http://ms25.x64.me/w32/iconv/iconv.zip
in a directory which is in PATH.
Regards,
Reinhard
--
------------------------------------------------------------------
Reinhard Kotucha Phone: +49-511-3373112
Marschnerstr. 25
D-30167 Hannover mailto:reinhard.kotucha at web.de
------------------------------------------------------------------
More information about the XeTeX
mailing list