CLSDASMLJul 22, 2019

On Modeling ASR Word Confidence

arXiv:1907.09636v47 citations
Originality Incremental advance
AI Analysis

This work addresses the issue of ASR errors affecting downstream applications for users of speech recognition systems, though it appears incremental as it builds on existing Word Confusion Networks.

The authors tackled the problem of improving ASR word confidence estimation by introducing a Heterogeneous Word Confusion Network and a score calibration method, resulting in a more accurate word sequence than the default 1-best result and enhanced reliability for recognizer combination.

We present a new method for computing ASR word confidences that effectively mitigates the effect of ASR errors for diverse downstream applications, improves the word error rate of the 1-best result, and allows better comparison of scores across different models. We propose 1) a new method for modeling word confidence using a Heterogeneous Word Confusion Network (HWCN) that addresses some key flaws in conventional Word Confusion Networks, and 2) a new score calibration method for facilitating direct comparison of scores from different models. Using a bidirectional lattice recurrent neural network to compute the confidence scores of each word in the HWCN, we show that the word sequence with the best overall confidence is more accurate than the default 1-best result of the recognizer, and that the calibration method can substantially improve the reliability of recognizer combination.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes