CLAILGOct 24, 2023

Accented Speech Recognition With Accent-specific Codebooks

DeepMind
arXiv:2310.15970v3132 citationsh-index: 30
Originality Incremental advance
AI Analysis

This addresses the challenge of inclusive adoption of ASR systems by improving performance across diverse accents, though it is incremental as it builds on existing end-to-end ASR methods.

The paper tackled the problem of speech recognition performance degradation for underrepresented accents by proposing an accent adaptation approach using trainable codebooks, achieving up to 37% relative improvement in word error rate on seen accents and up to 5% on unseen accents.

Speech accents pose a significant challenge to state-of-the-art automatic speech recognition (ASR) systems. Degradation in performance across underrepresented accents is a severe deterrent to the inclusive adoption of ASR. In this work, we propose a novel accent adaptation approach for end-to-end ASR systems using cross-attention with a trainable set of codebooks. These learnable codebooks capture accent-specific information and are integrated within the ASR encoder layers. The model is trained on accented English speech, while the test data also contained accents which were not seen during training. On the Mozilla Common Voice multi-accented dataset, we show that our proposed approach yields significant performance gains not only on the seen English accents (up to $37\%$ relative improvement in word error rate) but also on the unseen accents (up to $5\%$ relative improvement in WER). Further, we illustrate benefits for a zero-shot transfer setup on the L2Artic dataset. We also compare the performance with other approaches based on accent adversarial training.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes