Grapheme to Phoneme Conversion using Finite State Calculus and Transformation-based Learning

Gosse Bouma (Rijksuniversiteit Groningen)

We present a method for automatically converting a (Dutch) word into
a sequence of phonemes, representing its pronunciation. A finite state
transducer for this task was constructed by composing a transducer
which segments the input string into graphemes with a cascade of
(contex-sensitive) replace-rules, which transduce each grapheme into 
one or more phonemes. The resulting system achieves over 93% (phoneme) 
accuracy on test data from a word list (Celex). Next, we applied 
transformation-based learning (TBL) to the results of the (manually
constructed) finite state system. We did various experiments,
using both Brill's algorithm and a lazy, Monte Carlo, version of TBL.
Using training sets of up to 40.000 words, the TBL system achieves 
98% accuracy.