CLSDASJan 3, 2024

Hallucinations in Neural Automatic Speech Recognition: Identifying Errors and Hallucinatory Models

arXiv:2401.01572v134 citationsh-index: 6
Originality Incremental advance
AI Analysis

This addresses a critical issue for ASR system reliability by identifying deceptive errors that standard metrics miss, though it is incremental as it builds on prior NLP hallucination research.

The paper tackles the problem of hallucinations in automatic speech recognition (ASR), defining them as fluent but semantically unrelated transcriptions, and proposes a perturbation-based method to assess model susceptibility, showing it distinguishes hallucinatory models with similar word error rates.

Hallucinations are a type of output error produced by deep neural networks. While this has been studied in natural language processing, they have not been researched previously in automatic speech recognition. Here, we define hallucinations in ASR as transcriptions generated by a model that are semantically unrelated to the source utterance, yet still fluent and coherent. The similarity of hallucinations to probable natural language outputs of the model creates a danger of deception and impacts the credibility of the system. We show that commonly used metrics, such as word error rates, cannot differentiate between hallucinatory and non-hallucinatory models. To address this, we propose a perturbation-based method for assessing the susceptibility of an automatic speech recognition (ASR) model to hallucination at test time, which does not require access to the training dataset. We demonstrate that this method helps to distinguish between hallucinatory and non-hallucinatory models that have similar baseline word error rates. We further explore the relationship between the types of ASR errors and the types of dataset noise to determine what types of noise are most likely to create hallucinatory outputs. We devise a framework for identifying hallucinations by analysing their semantic connection with the ground truth and their fluency. Finally, we discover how to induce hallucinations with a random noise injection to the utterance.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes