CLAIApr 29, 2021

The Zero Resource Speech Challenge 2021: Spoken language modelling

arXiv:2104.14700v267 citations
AI Analysis

This addresses the problem of developing speech models without labeled data for researchers in speech processing, though it is incremental as it builds on prior zero-resource challenges.

The paper introduced the Zero Resource Speech Challenge 2021, which tasked participants with learning a language model directly from audio without text or labels, using the Libri-light dataset, and reported results from eight submitted systems evaluated across acoustic, lexical, syntactic, and semantic metrics.

We present the Zero Resource Speech Challenge 2021, which asks participants to learn a language model directly from audio, without any text or labels. The challenge is based on the Libri-light dataset, which provides up to 60k hours of audio from English audio books without any associated text. We provide a pipeline baseline system consisting on an encoder based on contrastive predictive coding (CPC), a quantizer ($k$-means) and a standard language model (BERT or LSTM). The metrics evaluate the learned representations at the acoustic (ABX discrimination), lexical (spot-the-word), syntactic (acceptability judgment) and semantic levels (similarity judgment). We present an overview of the eight submitted systems from four groups and discuss the main results.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes