CL SD ASJun 2, 2025

WCTC-Biasing: Retraining-free Contextual Biasing ASR with Wildcard CTC-based Keyword Spotting and Inter-layer Biasing

arXiv:2506.01263v12.7h-index: 1INTERSPEECH

Originality Incremental advance

AI Analysis

This addresses the issue of rare word recognition in ASR systems for users dealing with proper nouns, but it is incremental as it builds on existing CTC-based models without retraining.

The paper tackled the problem of end-to-end speech recognition models being biased towards training vocabulary, which leads to inaccurate recognition of proper nouns and unknown terms, by proposing a retraining-free method that uses wildcard CTC-based keyword spotting and inter-layer biasing, resulting in a 29% improvement in F1 score for unknown words in Japanese speech recognition.

Despite recent advances in end-to-end speech recognition methods, the output tends to be biased to the training data's vocabulary, resulting in inaccurate recognition of proper nouns and other unknown terms. To address this issue, we propose a method to improve recognition accuracy of such rare words in CTC-based models without additional training or text-to-speech systems. Specifically, keyword spotting is performed using acoustic features of intermediate layers during inference, and a bias is applied to the subsequent layers of the acoustic model for detected keywords. For keyword detection, we adopt a wildcard CTC that is both fast and tolerant of ambiguous matches, allowing flexible handling of words that are difficult to match strictly. Since this method does not require retraining of existing models, it can be easily applied to even large-scale models. In experiments on Japanese speech recognition, the proposed method achieved a 29% improvement in the F1 score for unknown words.

View on arXiv PDF

Similar