[tex-live] Strange license of ukhyphen (fwd)

Petr Sojka sojka at fi.muni.cz
Mon May 29 13:53:08 CEST 2006


> Werner LEMBERG wrote:
> > What about the following: Get a reliable list of UK English words
> > (probably sorted by frequency), apply the current UK patterns,
> > carefully check the results and regenerate the patterns.
> >
> good idea. curiously, my institution curates
> a 100 million word corpus of British English
> (http://www.natcorp.ox.ac.uk/), marked up
> to the word level; deriving a
> list of words from that would be a rather
> small bit of XML retrieval.
> 
> If I get the list of words, does anyone
> else have the time and energy to make the
> experiment?

I am willing to do the patterns generation part.
But
-- BNC wordlist (which I have too) is full of non-English words,
   proper names, ..., who will do the cleanup?
-- The most time-consuming step is checking the hyphenated 
   BNC wordlist by somebody knowing the ethymology 
   of English words -- this is the rule OUP
   use in deciding on (UK) hyphenation points.
US people/publishers use quite different rules
(basically syllable-based).

Send me the cleaned UK wordlist and I'll do the bootstrap phase
(prepare the hyphenated list and list of
candidates for checking [potential exceptions]).

All the best

--ps


More information about the tex-live mailing list