CV ASOct 24, 2018

The speaker-independent lipreading play-off; a survey of lipreading machines

Jake Burton, David Frank, Madhi Saleh, Nassir Navab, Helen L. Bear

arXiv:1810.10597v13.912 citations

Originality Synthesis-oriented

AI Analysis

This work addresses speaker-independence in lipreading, a key challenge for gesture classification, but it is incremental as it provides benchmarks rather than a new method.

The paper tackled the problem of speaker-independent lipreading by conducting a systematic survey on the TCD-TIMIT dataset, resulting in a best accuracy of 69.58% using CNN features and an SVM classifier, which is lower than speaker-dependent state-of-the-art but higher than previous independent reports.

Lipreading is a difficult gesture classification task. One problem in computer lipreading is speaker-independence. Speaker-independence means to achieve the same accuracy on test speakers not included in the training set as speakers within the training set. Current literature is limited on speaker-independent lipreading, the few independent test speaker accuracy scores are usually aggregated within dependent test speaker accuracies for an averaged performance. This leads to unclear independent results. Here we undertake a systematic survey of experiments with the TCD-TIMIT dataset using both conventional approaches and deep learning methods to provide a series of wholly speaker-independent benchmarks and show that the best speaker-independent machine scores 69.58% accuracy with CNN features and an SVM classifier. This is less than state of the art speaker-dependent lipreading machines, but greater than previously reported in independence experiments.

View on arXiv PDF

Similar