CLApr 16, 2021

Segmenting Subtitles for Correcting ASR Segmentation Errors

David Wan, Chris Kedzie, Faisal Ladhak, Elsbeth Turcan, Petra Galuščáková, Elena Zotkina, Zhengping Jiang, Peter Bell, Kathleen McKeown

arXiv:2104.07868v132.7802 citations

Originality Incremental advance

AI Analysis

This addresses segmentation errors in ASR for low-resource languages to enhance performance in translation and information retrieval, though it appears incremental as it builds on existing methods with a new dataset.

The paper tackles the mismatch between ASR acoustic segmentation and sentence-like units needed for machine translation by proposing a model that corrects ASR segmentation using subtitles as a proxy dataset, showing improvements in downstream tasks like MT and CLIR.

Typical ASR systems segment the input audio into utterances using purely acoustic information, which may not resemble the sentence-like units that are expected by conventional machine translation (MT) systems for Spoken Language Translation. In this work, we propose a model for correcting the acoustic segmentation of ASR models for low-resource languages to improve performance on downstream tasks. We propose the use of subtitles as a proxy dataset for correcting ASR acoustic segmentation, creating synthetic acoustic utterances by modeling common error modes. We train a neural tagging model for correcting ASR acoustic segmentation and show that it improves downstream performance on MT and audio-document cross-language information retrieval (CLIR).

View on arXiv PDF

Similar