CLSDASMay 22, 2023

Improving Isochronous Machine Translation with Target Factors and Auxiliary Counters

arXiv:2305.13204v15 citations
Originality Incremental advance
AI Analysis

This addresses the need for speech-aligned translations in automatic dubbing, representing an incremental improvement over previous approaches.

The paper tackles the problem of isochronous machine translation for automatic dubbing by introducing target factors and auxiliary counters in a transformer model to predict durations jointly with target phoneme sequences, resulting in improved translation quality and isochrony compared to prior methods.

To translate speech for automatic dubbing, machine translation needs to be isochronous, i.e. translated speech needs to be aligned with the source in terms of speech durations. We introduce target factors in a transformer model to predict durations jointly with target language phoneme sequences. We also introduce auxiliary counters to help the decoder to keep track of the timing information while generating target phonemes. We show that our model improves translation quality and isochrony compared to previous work where the translation model is instead trained to predict interleaved sequences of phonemes and durations.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes