[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Unicode and composite characters



Berthold wrote --
> 
>    correct me if I'm wrong, but the unicode people have defined the
>    lonely accent circumflex? & it's not a linguistic glyph 
>    (never used for itself) 
> 
> The UNICODE people - like any good committee - are not of one mind on this.
> 
> On the one hand they explicitly state (at least in earlier versions)
> that the `non spacing' accents are provided for constructing composites,
> yet also insist on listing all composites that actually occur.  
> 
> There are also `spacing' accents, by the way, whatever that is.  And
> the non-spacing ones are non-sense since they are meant to accent the
> character that comes before - nobody has though about the spacing /
> kerning issues.

That is because the details of spacing and kerning are properties
related to typography and glyphs and as such are explicitly and wisely
not the concern of a character encoding.

At an abstract level, the reason why the Unicode standard needs to
contain, and to define the _use_ of, what it calls "non-spacing marks"
and "combining characters" is described in Sections 3.9 of 5.9 of The
Unicode Standard Version 2.9 (A-W 1996).

Particular applications that make use of the decomposed forms are
sorting and searching (see section 5.15); these are (normally, at
least) independent of typographic conventions and glyphs and are the
concern of a character encoding.

Please do not interpret my defence of Unicode as meaning that I think
that Unicode "gets it all right"; it certainly does not!  But the
existence of a dotless j (or i, or anything else) in Unicode is not
closely related to whether a font needs to contain this glyph: it is
only relevant to whether applications concerned only with characters
need them.


chris