CLSDASMay 25, 2023

Svarah: Evaluating English ASR Systems on Indian Accents

arXiv:2305.15760v122 citationsHas Code
Originality Synthesis-oriented
AI Analysis

This addresses the problem of poor ASR performance for Indian English speakers, which is incremental as it introduces a new benchmark rather than a novel method.

The authors tackled the underrepresentation of Indian accents in English ASR benchmarks by creating Svarah, a 9.6-hour dataset from 117 speakers across India, and showed that existing ASR models have clear room for improvement on these accents.

India is the second largest English-speaking country in the world with a speaker base of roughly 130 million. Thus, it is imperative that automatic speech recognition (ASR) systems for English should be evaluated on Indian accents. Unfortunately, Indian speakers find a very poor representation in existing English ASR benchmarks such as LibriSpeech, Switchboard, Speech Accent Archive, etc. In this work, we address this gap by creating Svarah, a benchmark that contains 9.6 hours of transcribed English audio from 117 speakers across 65 geographic locations throughout India, resulting in a diverse range of accents. Svarah comprises both read speech and spontaneous conversational data, covering various domains, such as history, culture, tourism, etc., ensuring a diverse vocabulary. We evaluate 6 open source ASR models and 2 commercial ASR systems on Svarah and show that there is clear scope for improvement on Indian accents. Svarah as well as all our code will be publicly available.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes