CLSDASMay 21, 2023

Hystoc: Obtaining word confidences for fusion of end-to-end ASR systems

arXiv:2305.12579v1
Originality Incremental advance
AI Analysis

This work addresses a bottleneck in ASR for improving system fusion, but it is incremental as it builds on existing n-best and confusion network techniques.

The paper tackles the problem of obtaining well-calibrated word-level confidences for end-to-end ASR systems, proposing Hystoc, an iterative alignment method that converts n-best hypotheses into a confusion network to derive posterior probabilities, resulting in up to 1% absolute WER reduction on the Spanish RTVE2020 dataset when used in fusion.

End-to-end (e2e) systems have recently gained wide popularity in automatic speech recognition. However, these systems do generally not provide well-calibrated word-level confidences. In this paper, we propose Hystoc, a simple method for obtaining word-level confidences from hypothesis-level scores. Hystoc is an iterative alignment procedure which turns hypotheses from an n-best output of the ASR system into a confusion network. Eventually, word-level confidences are obtained as posterior probabilities in the individual bins of the confusion network. We show that Hystoc provides confidences that correlate well with the accuracy of the ASR hypothesis. Furthermore, we show that utilizing Hystoc in fusion of multiple e2e ASR systems increases the gains from the fusion by up to 1\,\% WER absolute on Spanish RTVE2020 dataset. Finally, we experiment with using Hystoc for direct fusion of n-best outputs from multiple systems, but we only achieve minor gains when fusing very similar systems.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes