CLNov 12, 2020

Enabling Interactive Transcription in an Indigenous Community

arXiv:2011.06198v1990 citations
AI Analysis

This addresses transcription challenges for indigenous communities with endangered languages, but it is incremental as it builds on existing methods for low-resource scenarios.

The authors tackled the problem of transcribing endangered languages with minimal initial data by proposing a workflow combining spoken term detection and human-in-the-loop, showing it can bootstrap transcription from a small set of isolated words.

We propose a novel transcription workflow which combines spoken term detection and human-in-the-loop, together with a pilot experiment. This work is grounded in an almost zero-resource scenario where only a few terms have so far been identified, involving two endangered languages. We show that in the early stages of transcription, when the available data is insufficient to train a robust ASR system, it is possible to take advantage of the transcription of a small number of isolated words in order to bootstrap the transcription of a speech collection.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes