LGMar 5

Beyond Word Error Rate: Auditing the Diversity Tax in Speech Recognition through Dataset Cartography

Ting-Hui Cheng, Line H. Clemmensen, Sneha Das

arXiv:2603.05267v1

Originality Incremental advance

AI Analysis

This work addresses the problem of biased ASR system evaluation for developers and auditors, aiming to reveal and mitigate disparities affecting marginalized and atypical speakers.

This paper explores the limitations of Word Error Rate (WER) in evaluating Automatic Speech Recognition (ASR) systems, arguing it obscures the 'diversity tax' on marginalized speakers. By introducing the sample difficulty index (SDI) and using metrics like EmbER and SemDist, the authors demonstrate how these expose hidden systemic biases and inter-model disagreements that WER fails to capture.

Automatic speech recognition (ASR) systems are predominantly evaluated using the Word Error Rate (WER). However, raw token-level metrics fail to capture semantic fidelity and routinely obscures the `diversity tax', the disproportionate burden on marginalized and atypical speaker due to systematic recognition failures. In this paper, we explore the limitations of relying solely on lexical counts by systematically evaluating a broader class of non-linear and semantic metrics. To enable rigorous model auditing, we introduce the sample difficulty index (SDI), a novel metric that quantifies how intrinsic demographic and acoustic factors drive model failure. By mapping SDI on data cartography, we demonstrate that metrics EmbER and SemDist expose hidden systemic biases and inter-model disagreements that WER ignores. Finally, our findings are the first steps towards a robust audit framework for prospective safety analysis, empowering developers to audit and mitigate ASR disparities prior to deployment.

View on arXiv PDF

Similar