CLSDASMar 14, 2022

RED-ACE: Robust Error Detection for ASR using Confidence Embeddings

DeepMind
arXiv:2203.07172v3291 citationsh-index: 11
Originality Incremental advance
AI Analysis

This work addresses the challenge of improving error detection in ASR systems for applications like transcription services, though it is incremental by building on existing text-based methods.

The paper tackled the problem of detecting errors in automatic speech recognition (ASR) transcriptions by incorporating the ASR system's word-level confidence scores, resulting in improved performance and robustness, as demonstrated through experiments on a novel dataset.

ASR Error Detection (AED) models aim to post-process the output of Automatic Speech Recognition (ASR) systems, in order to detect transcription errors. Modern approaches usually use text-based input, comprised solely of the ASR transcription hypothesis, disregarding additional signals from the ASR model. Instead, we propose to utilize the ASR system's word-level confidence scores for improving AED performance. Specifically, we add an ASR Confidence Embedding (ACE) layer to the AED model's encoder, allowing us to jointly encode the confidence scores and the transcribed text into a contextualized representation. Our experiments show the benefits of ASR confidence scores for AED, their complementary effect over the textual signal, as well as the effectiveness and robustness of ACE for combining these signals. To foster further research, we publish a novel AED dataset consisting of ASR outputs on the LibriSpeech corpus with annotated transcription errors.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes