CVASOct 3, 2017

Visual gesture variability between talkers in continuous visual speech

arXiv:1710.01297v1
Originality Synthesis-oriented
AI Analysis

This work addresses a challenge in lipreading systems for improving accuracy in speech recognition and applications like speech therapy, but it is incremental as it extends prior research from isolated words to continuous speech.

The study investigated whether the mapping between visual speech gestures (visemes) and phonemes is speaker-dependent in continuous speech, finding that the trajectory between visemes negatively impacts speaker differentiation more than in isolated word recognition.

Recent adoption of deep learning methods to the field of machine lipreading research gives us two options to pursue to improve system performance. Either, we develop end-to-end systems holistically or, we experiment to further our understanding of the visual speech signal. The latter option is more difficult but this knowledge would enable researchers to both improve systems and apply the new knowledge to other domains such as speech therapy. One challenge in lipreading systems is the correct labeling of the classifiers. These labels map an estimated function between visemes on the lips and the phonemes uttered. Here we ask if such maps are speaker-dependent? Prior work investigated isolated word recognition from speaker-dependent (SD) visemes, we extend this to continuous speech. Benchmarked against SD results, and the isolated words performance, we test with RMAV dataset speakers and observe that with continuous speech, the trajectory between visemes has a greater negative effect on the speaker differentiation.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes