ASCLSDSep 30, 2022

Blind Signal Dereverberation for Machine Speech Recognition

arXiv:2210.00117v1h-index: 52
Originality Synthesis-oriented
AI Analysis

This addresses speech recognition accuracy in noisy environments, but appears incremental as it builds on existing spectral processing techniques.

The paper tackles the problem of removing unknown convolutive noise from speech caused by reverberations in recording environments, using a method that converts convolution to additions in the log spectral domain and applies spectral normalization, resulting in dereverberated speech spectra for improved automatic speech recognition.

We present a method to remove unknown convolutive noise introduced to speech by reverberations of recording environments, utilizing some amount of training speech data from the reverberant environment, and any available non-reverberant speech data. Using Fourier transform computed over long temporal windows, which ideally cover the entire room impulse response, we convert room induced convolution to additions in the log spectral domain. Next, we compute a spectral normalization vector from statistics gathered over reverberated as well as over clean speech in the log spectral domain. During operation, this normalization vectors are used to alleviate reverberations from complex speech spectra recorded under the same reverberant conditions . Such dereverberated complex speech spectra are used to compute complex FDLP-spectrograms for use in automatic speech recognition.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes