CLAug 8, 2022

A High-Quality and Large-Scale Dataset for English-Vietnamese Speech Translation

arXiv:2208.04243v15 citationsh-index: 34Has Code
Originality Synthesis-oriented
AI Analysis

This provides a foundational dataset and study for future research and applications in English-Vietnamese speech translation, addressing a gap for this language pair.

The authors tackled the lack of a large-scale dataset for English-Vietnamese speech translation by creating a high-quality benchmark with 508 audio hours and 331K triplets, and found that the traditional cascaded approach outperforms the modern end-to-end approach in empirical experiments.

In this paper, we introduce a high-quality and large-scale benchmark dataset for English-Vietnamese speech translation with 508 audio hours, consisting of 331K triplets of (sentence-lengthed audio, English source transcript sentence, Vietnamese target subtitle sentence). We also conduct empirical experiments using strong baselines and find that the traditional "Cascaded" approach still outperforms the modern "End-to-End" approach. To the best of our knowledge, this is the first large-scale English-Vietnamese speech translation study. We hope both our publicly available dataset and study can serve as a starting point for future research and applications on English-Vietnamese speech translation. Our dataset is available at https://github.com/VinAIResearch/PhoST

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes