DNN-based uncertainty estimation for weighted DNN-HMM ASR
This work addresses uncertainty estimation for speech recognition systems, but it appears incremental as it builds on existing DNN-HMM frameworks.
The paper tackles the problem of uncertainty estimation in speech recognition by training a DNN to predict uncertainty from enhanced noisy observations, which is then used in a weighted DNN-HMM system. Results show comparisons with an existing method on the Aurora-4 task under various training conditions.
In this paper, the uncertainty is defined as the mean square error between a given enhanced noisy observation vector and the corresponding clean one. Then, a DNN is trained by using enhanced noisy observation vectors as input and the uncertainty as output with a training database. In testing, the DNN receives an enhanced noisy observation vector and delivers the estimated uncertainty. This uncertainty in employed in combination with a weighted DNN-HMM based speech recognition system and compared with an existing estimation of the noise cancelling uncertainty variance based on an additive noise model. Experiments were carried out with Aurora-4 task. Results with clean, multi-noise and multi-condition training are presented.