ASCLSDJun 24, 2020

Black-box Adaptation of ASR for Accented Speech

arXiv:2006.13519v111 citationsHas Code
Originality Incremental advance
AI Analysis

This addresses the issue of poor ASR performance for accented speech users, offering a practical adaptation method without requiring model access, though it is incremental as it builds on existing combination strategies.

The paper tackles the problem of adapting black-box ASR systems for accented speech, where services like Google's API perform poorly on accents like Indian, with WER almost double that on US accents. The proposed method couples an accent-tuned local model with the black-box service, achieving up to 28% relative reduction in WER over both models on Indian and Australian accents.

We introduce the problem of adapting a black-box, cloud-based ASR system to speech from a target accent. While leading online ASR services obtain impressive performance on main-stream accents, they perform poorly on sub-populations - we observed that the word error rate (WER) achieved by Google's ASR API on Indian accents is almost twice the WER on US accents. Existing adaptation methods either require access to model parameters or overlay an error-correcting module on output transcripts. We highlight the need for correlating outputs with the original speech to fix accent errors. Accordingly, we propose a novel coupling of an open-source accent-tuned local model with the black-box service where the output from the service guides frame-level inference in the local model. Our fine-grained merging algorithm is better at fixing accent errors than existing word-level combination strategies. Experiments on Indian and Australian accents with three leading ASR models as service, show that we achieve as much as 28% relative reduction in WER over both the local and service models.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes