AppTek's Submission to the IWSLT 2022 Isometric Spoken Language Translation Task
This work addresses the challenge of maintaining translation quality while controlling output length for spoken language translation, which is incremental as it builds on existing methods with specific adaptations.
The paper tackled the problem of isometric spoken language translation from English to German by developing Transformer-based systems with length control mechanisms, achieving over 90% length compliance while minimizing losses in translation quality as measured by BERT and BLEU scores.
To participate in the Isometric Spoken Language Translation Task of the IWSLT 2022 evaluation, constrained condition, AppTek developed neural Transformer-based systems for English-to-German with various mechanisms of length control, ranging from source-side and target-side pseudo-tokens to encoding of remaining length in characters that replaces positional encoding. We further increased translation length compliance by sentence-level selection of length-compliant hypotheses from different system variants, as well as rescoring of N-best candidates from a single system. Length-compliant back-translated and forward-translated synthetic data, as well as other parallel data variants derived from the original MuST-C training corpus were important for a good quality/desired length trade-off. Our experimental results show that length compliance levels above 90% can be reached while minimizing losses in MT quality as measured in BERT and BLEU scores.