CLAIOct 12, 2020

Perceptimatic: A human speech perception benchmark for unsupervised subword modelling

arXiv:2010.05961v16 citationsHas Code
Originality Synthesis-oriented
AI Analysis

This work addresses the problem of evaluating unsupervised speech models against human perception for researchers in speech processing, though it is incremental as it builds on existing datasets and methods.

The authors introduced Perceptimatic, a benchmark dataset for comparing speech processing models to human behavior on phone discrimination tasks, showing that a standard supervised monolingual HMM-GMM system performs well in discrimination but yields a representational space significantly different from human listeners.

In this paper, we present a data set and methods to compare speech processing models and human behaviour on a phone discrimination task. We provide Perceptimatic, an open data set which consists of French and English speech stimuli, as well as the results of 91 English- and 93 French-speaking listeners. The stimuli test a wide range of French and English contrasts, and are extracted directly from corpora of natural running read speech, used for the 2017 Zero Resource Speech Challenge. We provide a method to compare humans' perceptual space with models' representational space, and we apply it to models previously submitted to the Challenge. We show that, unlike unsupervised models and supervised multilingual models, a standard supervised monolingual HMM-GMM phone recognition system, while good at discriminating phones, yields a representational space very different from that of human native listeners.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes