CLAISDASJun 4, 2023

SpellMapper: A non-autoregressive neural spellchecker for ASR customization with candidate retrieval based on n-gram mappings

arXiv:2306.02317v18 citationsh-index: 32
Originality Highly original
AI Analysis

This addresses the need for efficient and accurate ASR customization for users with large vocabularies, representing a strong specific gain rather than a foundational advance.

The paper tackles the problem of improving automatic speech recognition (ASR) quality for user-specific vocabularies by proposing SpellMapper, a non-autoregressive neural spellchecker with a novel candidate retrieval method, achieving a 21.4% word error rate improvement on Spoken Wikipedia.

Contextual spelling correction models are an alternative to shallow fusion to improve automatic speech recognition (ASR) quality given user vocabulary. To deal with large user vocabularies, most of these models include candidate retrieval mechanisms, usually based on minimum edit distance between fragments of ASR hypothesis and user phrases. However, the edit-distance approach is slow, non-trainable, and may have low recall as it relies only on common letters. We propose: 1) a novel algorithm for candidate retrieval, based on misspelled n-gram mappings, which gives up to 90% recall with just the top 10 candidates on Spoken Wikipedia; 2) a non-autoregressive neural model based on BERT architecture, where the initial transcript and ten candidates are combined into one input. The experiments on Spoken Wikipedia show 21.4% word error rate improvement compared to a baseline ASR system.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes