Unifying Global and Near-Context Biasing in a Single Trie Pass
This work addresses the challenge of adapting ASR systems to new domains and improving rare word recognition for users, though it appears incremental as it builds on existing biasing strategies.
The paper tackled the problem of recognizing rare and out-of-vocabulary words, such as named entities, in automatic speech recognition by combining a named entity bias list with a word-level n-gram language model. The result was up to a 32% relative improvement in entity recognition and up to a 12% relative reduction in overall word error rate across three datasets in four languages.
Despite the success of end-to-end automatic speech recognition (ASR) models, challenges persist in recognizing rare, out-of-vocabulary words - including named entities (NE) - and in adapting to new domains using only text data. This work presents a practical approach to address these challenges through an unexplored combination of an NE bias list and a word-level n-gram language model (LM). This solution balances simplicity and effectiveness, improving entities' recognition while maintaining or even enhancing overall ASR performance. We efficiently integrate this enriched biasing method into a transducer-based ASR system, enabling context adaptation with almost no computational overhead. We present our results on three datasets spanning four languages and compare them to state-of-the-art biasing strategies. We demonstrate that the proposed combination of keyword biasing and n-gram LM improves entity recognition by up to 32% relative and reduces overall WER by up to a 12% relative.