CLSep 26, 2023

Segmentation-Free Streaming Machine Translation

arXiv:2309.14823v226 citationsh-index: 17
Originality Incremental advance
AI Analysis

This addresses a bottleneck in real-time translation systems for applications requiring low-latency and high-quality output, though it is incremental as it builds on existing streaming MT methods.

The paper tackles the problem of errors in streaming machine translation caused by hard segmentation in traditional cascade approaches, proposing a Segmentation-Free framework that delays segmentation until after translation generation, resulting in a better quality-latency trade-off compared to methods using independent segmentation models.

Streaming Machine Translation (MT) is the task of translating an unbounded input text stream in real-time. The traditional cascade approach, which combines an Automatic Speech Recognition (ASR) and an MT system, relies on an intermediate segmentation step which splits the transcription stream into sentence-like units. However, the incorporation of a hard segmentation constrains the MT system and is a source of errors. This paper proposes a Segmentation-Free framework that enables the model to translate an unsegmented source stream by delaying the segmentation decision until the translation has been generated. Extensive experiments show how the proposed Segmentation-Free framework has better quality-latency trade-off than competing approaches that use an independent segmentation model. Software, data and models will be released upon paper acceptance.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes