CVASIVOct 26, 2017

Lip2AudSpec: Speech reconstruction from silent lip movements video

arXiv:1710.09798v193 citations
Originality Incremental advance
AI Analysis

This work addresses speech reconstruction for applications like assistive technologies or security, but it is incremental as it builds on existing lip-reading methods with a new spectral representation.

The authors tackled the problem of reconstructing intelligible speech from silent lip movement videos, achieving a 98% correlation in auditory spectrogram reconstruction and superior word recognition accuracy.

In this study, we propose a deep neural network for reconstructing intelligible speech from silent lip movement videos. We use auditory spectrogram as spectral representation of speech and its corresponding sound generation method resulting in a more natural sounding reconstructed speech. Our proposed network consists of an autoencoder to extract bottleneck features from the auditory spectrogram which is then used as target to our main lip reading network comprising of CNN, LSTM and fully connected layers. Our experiments show that the autoencoder is able to reconstruct the original auditory spectrogram with a 98% correlation and also improves the quality of reconstructed speech from the main lip reading network. Our model, trained jointly on different speakers is able to extract individual speaker characteristics and gives promising results of reconstructing intelligible speech with superior word recognition accuracy.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes