CLSep 16, 2019

Fast transcription of speech in low-resource languages

arXiv:1909.07285v13 citations
Originality Incremental advance
AI Analysis

This addresses the challenge of speech transcription for low-resource language communities, though it appears incremental as it builds on existing methods like pretrained models and language models.

The authors tackled the problem of transcribing speech in low-resource languages by developing software that uses minimal text data and a zero-resource grapheme-to-phoneme table, achieving transcription of forty hours of speech in a few hours across multiple languages.

We present software that, in only a few hours, transcribes forty hours of recorded speech in a surprise language, using only a few tens of megabytes of noisy text in that language, and a zero-resource grapheme to phoneme (G2P) table. A pretrained acoustic model maps acoustic features to phonemes; a reversed G2P maps these to graphemes; then a language model maps these to a most-likely grapheme sequence, i.e., a transcription. This software has worked successfully with corpora in Arabic, Assam, Kinyarwanda, Russian, Sinhalese, Swahili, Tagalog, and Tamil.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes