CVASOct 3, 2017

Decoding visemes: improving machine lipreading

arXiv:1710.01169v149 citations
Originality Incremental advance
AI Analysis

This work addresses the challenge of improving speech recognition from visual signals for applications like assistive technology, though it appears incremental as it builds on existing phoneme and viseme classification methods.

The paper tackles the problem of machine lip-reading by introducing a novel two-pass training method for phoneme classifiers that leverages previously trained visemes, resulting in significantly improved classification performance over prior approaches.

To undertake machine lip-reading, we try to recognise speech from a visual signal. Current work often uses viseme classification supported by language models with varying degrees of success. A few recent works suggest phoneme classification, in the right circumstances, can outperform viseme classification. In this work we present a novel two-pass method of training phoneme classifiers which uses previously trained visemes in the first pass. With our new training algorithm, we show classification performance which significantly improves on previous lip-reading results.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes