Decoding visemes: improving machine lipreading
This work addresses the challenge of improving speech recognition from visual signals for applications like assistive technology, though it appears incremental as it builds on existing phoneme and viseme classification methods.
The paper tackles the problem of machine lip-reading by introducing a novel two-pass training method for phoneme classifiers that leverages previously trained visemes, resulting in significantly improved classification performance over prior approaches.
To undertake machine lip-reading, we try to recognise speech from a visual signal. Current work often uses viseme classification supported by language models with varying degrees of success. A few recent works suggest phoneme classification, in the right circumstances, can outperform viseme classification. In this work we present a novel two-pass method of training phoneme classifiers which uses previously trained visemes in the first pass. With our new training algorithm, we show classification performance which significantly improves on previous lip-reading results.