CVIVOct 3, 2017

Some observations on computer lip-reading: moving from the dream to the reality

arXiv:1710.01084v113 citations
Originality Synthesis-oriented
AI Analysis

This work addresses the problem of improving lip-reading accuracy for applications like assistive technology, but it is incremental as it reviews and critiques existing assumptions rather than proposing a new method.

The paper challenges common assumptions in computer lip-reading, finding that performance is not heavily constrained by practical factors like video resolution, pose, or lighting, but questions the optimality of using visemes as recognition units.

In the quest for greater computer lip-reading performance there are a number of tacit assumptions which are either present in the datasets (high resolution for example) or in the methods (recognition of spoken visual units called visemes for example). Here we review these and other assumptions and show the surprising result that computer lip-reading is not heavily constrained by video resolution, pose, lighting and other practical factors. However, the working assumption that visemes, which are the visual equivalent of phonemes, are the best unit for recognition does need further examination. We conclude that visemes, which were defined over a century ago, are unlikely to be optimal for a modern computer lip-reading system.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes