The NiuTrans End-to-End Speech Translation System for IWSLT 2021 Offline Task
This work addresses speech translation for multilingual communication, but it is incremental as it builds on existing Transformer and Conformer architectures with enhancements.
The paper tackled the problem of end-to-end speech translation from English audio to German text without intermediate transcription, achieving 33.84 BLEU points on the MuST-C En-De test set.
This paper describes the submission of the NiuTrans end-to-end speech translation system for the IWSLT 2021 offline task, which translates from the English audio to German text directly without intermediate transcription. We use the Transformer-based model architecture and enhance it by Conformer, relative position encoding, and stacked acoustic and textual encoding. To augment the training data, the English transcriptions are translated to German translations. Finally, we employ ensemble decoding to integrate the predictions from several models trained with the different datasets. Combining these techniques, we achieve 33.84 BLEU points on the MuST-C En-De test set, which shows the enormous potential of the end-to-end model.