A Perceptual Alphabet for the 10-dimensional Phonetic-prosodic Space
This work addresses speech representation for recognition systems, but appears incremental as it supersedes an earlier version and supplements a separate exposition.
The paper tackles the problem of representing speech by defining a perceptual alphabet (IHA) for a 10-dimensional phonetic-prosodic space, based on an oral billiards model, and has implemented it in a speech recognizer.
We define an alphabet, the IHA, of the 10-D phonetic-prosodic space. The dimensions of this space are perceptual observables, rather than articulatory specifications. Speech is defined as a random chain in time of the 4-D phonetic subspace, that is, a symbolic sequence, augmented with diacritics of the remaining 6-D prosodic subspace. The definitions here are based on the model of speech of oral billiards, and supersedes an earlier version. This paper only enumerates the IHA in detail as a supplement to the exposition of oral billiards in a separate paper. The IHA has been implemented as the target random variable in a speech recognizer.