CVASOct 24, 2018

The speaker-independent lipreading play-off; a survey of lipreading machines

arXiv:1810.10597v112 citations
Originality Synthesis-oriented
AI Analysis

This work addresses speaker-independence in lipreading, a key challenge for gesture classification, but it is incremental as it provides benchmarks rather than a new method.

The paper tackled the problem of speaker-independent lipreading by conducting a systematic survey on the TCD-TIMIT dataset, resulting in a best accuracy of 69.58% using CNN features and an SVM classifier, which is lower than speaker-dependent state-of-the-art but higher than previous independent reports.

Lipreading is a difficult gesture classification task. One problem in computer lipreading is speaker-independence. Speaker-independence means to achieve the same accuracy on test speakers not included in the training set as speakers within the training set. Current literature is limited on speaker-independent lipreading, the few independent test speaker accuracy scores are usually aggregated within dependent test speaker accuracies for an averaged performance. This leads to unclear independent results. Here we undertake a systematic survey of experiments with the TCD-TIMIT dataset using both conventional approaches and deep learning methods to provide a series of wholly speaker-independent benchmarks and show that the best speaker-independent machine scores 69.58% accuracy with CNN features and an SVM classifier. This is less than state of the art speaker-dependent lipreading machines, but greater than previously reported in independence experiments.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes