CLSDASJun 14, 2023

Tagged End-to-End Simultaneous Speech Translation Training using Simultaneous Interpretation Data

arXiv:2306.08582v1226 citationsh-index: 22
Originality Incremental advance
AI Analysis

This work addresses the problem of latency and monotonicity in simultaneous speech translation for researchers and practitioners, but it is incremental as it builds on existing data mixing approaches.

The paper tackles the challenge of simultaneous speech translation for distant language pairs like English-Japanese by training a model using both simultaneous interpretation and offline bilingual data with style tags. The result shows improvements in BLEURT scores across various latency ranges and increased generation of simultaneous interpretation-style outputs.

Simultaneous speech translation (SimulST) translates partial speech inputs incrementally. Although the monotonic correspondence between input and output is preferable for smaller latency, it is not the case for distant language pairs such as English and Japanese. A prospective approach to this problem is to mimic simultaneous interpretation (SI) using SI data to train a SimulST model. However, the size of such SI data is limited, so the SI data should be used together with ordinary bilingual data whose translations are given in offline. In this paper, we propose an effective way to train a SimulST model using mixed data of SI and offline. The proposed method trains a single model using the mixed data with style tags that tell the model to generate SI- or offline-style outputs. Experiment results show improvements of BLEURT in different latency ranges, and our analyses revealed the proposed model generates SI-style outputs more than the baseline.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes