CLSep 16, 2019

Fast transcription of speech in low-resource languages

Mark Hasegawa-Johnson, Camille Goudeseune, Gina-Anne Levow

arXiv:1909.07285v10.23 citationsHas Code

Originality Incremental advance

AI Analysis

This addresses the challenge of speech transcription for low-resource language communities, though it appears incremental as it builds on existing methods like pretrained models and language models.

The authors tackled the problem of transcribing speech in low-resource languages by developing software that uses minimal text data and a zero-resource grapheme-to-phoneme table, achieving transcription of forty hours of speech in a few hours across multiple languages.

We present software that, in only a few hours, transcribes forty hours of recorded speech in a surprise language, using only a few tens of megabytes of noisy text in that language, and a zero-resource grapheme to phoneme (G2P) table. A pretrained acoustic model maps acoustic features to phonemes; a reversed G2P maps these to graphemes; then a language model maps these to a most-likely grapheme sequence, i.e., a transcription. This software has worked successfully with corpora in Arabic, Assam, Kinyarwanda, Russian, Sinhalese, Swahili, Tagalog, and Tamil.

View on arXiv PDF Code

Similar