CV IVOct 3, 2017

Some observations on computer lip-reading: moving from the dream to the reality

Helen L. Bear, Gari Owen, Richard Harvey, Barry-John Theobald

arXiv:1710.01084v12.413 citations

Originality Synthesis-oriented

AI Analysis

This work addresses the problem of improving lip-reading accuracy for applications like assistive technology, but it is incremental as it reviews and critiques existing assumptions rather than proposing a new method.

The paper challenges common assumptions in computer lip-reading, finding that performance is not heavily constrained by practical factors like video resolution, pose, or lighting, but questions the optimality of using visemes as recognition units.

In the quest for greater computer lip-reading performance there are a number of tacit assumptions which are either present in the datasets (high resolution for example) or in the methods (recognition of spoken visual units called visemes for example). Here we review these and other assumptions and show the surprising result that computer lip-reading is not heavily constrained by video resolution, pose, lighting and other practical factors. However, the working assumption that visemes, which are the visual equivalent of phonemes, are the best unit for recognition does need further examination. We conclude that visemes, which were defined over a century ago, are unlikely to be optimal for a modern computer lip-reading system.

View on arXiv PDF

Similar