CL AI LG SD ASSep 24, 2023

Human Transcription Quality Improvement

Jian Gao, Hanbo Sun, Cheng Cao, Zheng Du

arXiv:2309.14372v11.35 citationsh-index: 5Has Code

Originality Incremental advance

AI Analysis

This work addresses the need for affordable, high-quality transcription data for ASR research, offering a significant improvement over existing crowdsourced methods.

The paper tackles the problem of low-quality crowdsourced speech transcriptions by proposing a method that reduces transcription word error rate (WER) by over 50% and improves ASR model performance with over 10% relative WER reduction.

High quality transcription data is crucial for training automatic speech recognition (ASR) systems. However, the existing industry-level data collection pipelines are expensive to researchers, while the quality of crowdsourced transcription is low. In this paper, we propose a reliable method to collect speech transcriptions. We introduce two mechanisms to improve transcription quality: confidence estimation based reprocessing at labeling stage, and automatic word error correction at post-labeling stage. We collect and release LibriCrowd - a large-scale crowdsourced dataset of audio transcriptions on 100 hours of English speech. Experiment shows the Transcription WER is reduced by over 50%. We further investigate the impact of transcription error on ASR model performance and found a strong correlation. The transcription quality improvement provides over 10% relative WER reduction for ASR models. We release the dataset and code to benefit the research community.

View on arXiv PDF Code

Similar