Advancing Speech Translation: A Corpus of Mandarin-English Conversational Telephone Speech
This provides a domain-specific resource for speech translation, addressing a data bottleneck for researchers and practitioners in this area.
The paper tackles the lack of paired speech-text data for Mandarin-English conversational telephone speech by introducing a 123-hour corpus, and fine-tuning a general-purpose translation model on this dataset improves target-domain BLEU by over 8 points.
This paper introduces a set of English translations for a 123-hour subset of the CallHome Mandarin Chinese data and the HKUST Mandarin Telephone Speech data for the task of speech translation. Paired source-language speech and target-language text is essential for training end-to-end speech translation systems and can provide substantial performance improvements for cascaded systems as well, relative to training on more widely available text data sets. We demonstrate that fine-tuning a general-purpose translation model to our Mandarin-English conversational telephone speech training set improves target-domain BLEU by more than 8 points, highlighting the importance of matched training data.