CLSDASMar 26, 2021

Leveraging pre-trained representations to improve access to untranscribed speech from endangered languages

arXiv:2103.14583v319 citations
AI Analysis

This work addresses the challenge of accessing untranscribed speech for endangered languages, which often lack sufficient data for standard ASR methods, though it is incremental as it applies existing representations to a specific task.

The paper tackled the problem of improving query-by-example spoken term detection for endangered languages with limited data by leveraging pre-trained speech representations like wav2vec 2.0, resulting in large relative improvements of 56-86% over state-of-the-art approaches on datasets from 7 Australian Aboriginal languages and a regional Dutch variety.

Pre-trained speech representations like wav2vec 2.0 are a powerful tool for automatic speech recognition (ASR). Yet many endangered languages lack sufficient data for pre-training such models, or are predominantly oral vernaculars without a standardised writing system, precluding fine-tuning. Query-by-example spoken term detection (QbE-STD) offers an alternative for iteratively indexing untranscribed speech corpora by locating spoken query terms. Using data from 7 Australian Aboriginal languages and a regional variety of Dutch, all of which are endangered or vulnerable, we show that QbE-STD can be improved by leveraging representations developed for ASR (wav2vec 2.0: the English monolingual model and XLSR53 multilingual model). Surprisingly, the English model outperformed the multilingual model on 4 Australian language datasets, raising questions around how to optimally leverage self-supervised speech representations for QbE-STD. Nevertheless, we find that wav2vec 2.0 representations (either English or XLSR53) offer large improvements (56-86% relative) over state-of-the-art approaches on our endangered language datasets.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes