SDLGMay 5

Contrastive Regularization for Accent-Robust ASR

arXiv:2605.0329733.2
Predicted impact top 73% in SD · last 90 daysOriginality Incremental advance
AI Analysis

For ASR systems that struggle with accent variability, this work offers a model-agnostic regularization method that improves robustness without architectural changes or accent labels.

The paper proposes using supervised contrastive learning as a lightweight auxiliary objective during CTC fine-tuning to improve accent robustness in ASR, achieving up to 25-29% relative WER reduction on unseen accents in the L2-ARCTIC benchmark.

ASR systems based on self-supervised acoustic pretraining and CTC fine-tuning achieve strong performance on native speech but remain sensitive to accent variability. We investigate supervised contrastive learning (SupCon) as a lightweight, accent-invariant auxiliary objective for CTC fine-tuning. An utterance-level contrastive loss regularizes encoder representations without architectural modification or explicit accent supervision. Experiments on the L2-ARCTIC benchmark show consistent WER reductions across multiple pretrained encoders, with up to 25 -- 29\% relative reduction under unseen-accent evaluation. Analysis using within-transcript cosine dispersion indicates that SupCon promotes more compact and stable representation geometry under accent variability. Overall, SupCon provides an effective and model-agnostic regularization strategy for improving accent robustness.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes