CLSep 26, 2023

Segmentation-Free Streaming Machine Translation

Javier Iranzo-Sánchez, Jorge Iranzo-Sánchez, Adrià Giménez, Jorge Civera, Alfons Juan

arXiv:2309.14823v28.526 citationsh-index: 17Has Code

Originality Incremental advance

AI Analysis

This addresses a bottleneck in real-time translation systems for applications requiring low-latency and high-quality output, though it is incremental as it builds on existing streaming MT methods.

The paper tackles the problem of errors in streaming machine translation caused by hard segmentation in traditional cascade approaches, proposing a Segmentation-Free framework that delays segmentation until after translation generation, resulting in a better quality-latency trade-off compared to methods using independent segmentation models.

Streaming Machine Translation (MT) is the task of translating an unbounded input text stream in real-time. The traditional cascade approach, which combines an Automatic Speech Recognition (ASR) and an MT system, relies on an intermediate segmentation step which splits the transcription stream into sentence-like units. However, the incorporation of a hard segmentation constrains the MT system and is a source of errors. This paper proposes a Segmentation-Free framework that enables the model to translate an unsegmented source stream by delaying the segmentation decision until the translation has been generated. Extensive experiments show how the proposed Segmentation-Free framework has better quality-latency trade-off than competing approaches that use an independent segmentation model. Software, data and models will be released upon paper acceptance.

View on arXiv PDF Code

Similar