CLAISDASAug 4, 2023

N-gram Boosting: Improving Contextual Biasing with Normalized N-gram Targets

arXiv:2308.02092v1h-index: 3
Originality Incremental advance
AI Analysis

This work addresses a domain-specific challenge in speech-to-text applications for business conversations, where rare terms are critical but often misrecognized, and it is incremental as it builds on existing contextual biasing methods.

The paper tackled the problem of accurately transcribing rare proper names and technical terms in speech-to-text for business conversations by introducing a two-step keyword boosting mechanism using normalized n-grams, which improved keyword recognition rates by 26% relative on a proprietary dataset and 2% on LibriSpeech.

Accurate transcription of proper names and technical terms is particularly important in speech-to-text applications for business conversations. These words, which are essential to understanding the conversation, are often rare and therefore likely to be under-represented in text and audio training data, creating a significant challenge in this domain. We present a two-step keyword boosting mechanism that successfully works on normalized unigrams and n-grams rather than just single tokens, which eliminates missing hits issues with boosting raw targets. In addition, we show how adjusting the boosting weight logic avoids over-boosting multi-token keywords. This improves our keyword recognition rate by 26% relative on our proprietary in-domain dataset and 2% on LibriSpeech. This method is particularly useful on targets that involve non-alphabetic characters or have non-standard pronunciations.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes