Robust coherence-based spectral enhancement for distant speech recognition
This work addresses speech recognition in noisy real-world settings, but it is incremental as it builds upon an existing baseline system.
The authors tackled the problem of distant speech recognition in noisy public environments by integrating a coherence-based Wiener filter into the baseline system's front-end, resulting in improved word error rates as confirmed by evaluation on the CHiME-3 challenge data.
In this contribution to the 3rd CHiME Speech Separation and Recognition Challenge (CHiME-3) we extend the acoustic front-end of the CHiME-3 baseline speech recognition system by a coherence-based Wiener filter which is applied to the output signal of the baseline beamformer. To compute the time- and frequency-dependent postfilter gains the ratio between direct and diffuse signal components at the output of the baseline beamformer is estimated and used as approximation of the short-time signal-to-noise ratio. The proposed spectral enhancement technique is evaluated with respect to word error rates of the CHiME-3 challenge baseline speech recognition system using real speech recorded in public environments. Results confirm the effectiveness of the coherence-based postfilter when integrated into the front-end signal enhancement.