Fast transcription of speech in low-resource languages
This addresses the challenge of speech transcription for low-resource language communities, though it appears incremental as it builds on existing methods like pretrained models and language models.
The authors tackled the problem of transcribing speech in low-resource languages by developing software that uses minimal text data and a zero-resource grapheme-to-phoneme table, achieving transcription of forty hours of speech in a few hours across multiple languages.
We present software that, in only a few hours, transcribes forty hours of recorded speech in a surprise language, using only a few tens of megabytes of noisy text in that language, and a zero-resource grapheme to phoneme (G2P) table. A pretrained acoustic model maps acoustic features to phonemes; a reversed G2P maps these to graphemes; then a language model maps these to a most-likely grapheme sequence, i.e., a transcription. This software has worked successfully with corpora in Arabic, Assam, Kinyarwanda, Russian, Sinhalese, Swahili, Tagalog, and Tamil.