CLSDASMay 7, 2020

The Perceptimatic English Benchmark for Speech Perception Models

arXiv:2005.03418v14 citations
AI Analysis

This provides a domain-specific tool for researchers in speech perception and automatic speech recognition to benchmark models against human data.

The authors introduced the Perceptimatic English Benchmark, an open benchmark with ABX stimuli and responses from 91 listeners, to evaluate speech perception models in English, showing that DeepSpeech is more specialized on English phoneme discrimination than humans but poorly correlated with their behavior.

We present the Perceptimatic English Benchmark, an open experimental benchmark for evaluating quantitative models of speech perception in English. The benchmark consists of ABX stimuli along with the responses of 91 American English-speaking listeners. The stimuli test discrimination of a large number of English and French phonemic contrasts. They are extracted directly from corpora of read speech, making them appropriate for evaluating statistical acoustic models (such as those used in automatic speech recognition) trained on typical speech data sets. We show that phone discrimination is correlated with several types of models, and give recommendations for researchers seeking easily calculated norms of acoustic distance on experimental stimuli. We show that DeepSpeech, a standard English speech recognizer, is more specialized on English phoneme discrimination than English listeners, and is poorly correlated with their behaviour, even though it yields a low error on the decision task given to humans.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes