IMS' Systems for the IWSLT 2021 Low-Resource Speech Translation Task
This addresses the problem of speech translation for low-resource languages, but it is incremental as it builds on existing methods.
The paper tackled low-resource speech translation by combining state-of-the-art models with data augmentation and transfer learning in a cascaded system, achieving the best BLEU scores of 7.7 and 13.7 for Congolese Swahili to English and French, and second best with 14.9 for Coastal Swahili to English.
This paper describes the submission to the IWSLT 2021 Low-Resource Speech Translation Shared Task by IMS team. We utilize state-of-the-art models combined with several data augmentation, multi-task and transfer learning approaches for the automatic speech recognition (ASR) and machine translation (MT) steps of our cascaded system. Moreover, we also explore the feasibility of a full end-to-end speech translation (ST) model in the case of very constrained amount of ground truth labeled data. Our best system achieves the best performance among all submitted systems for Congolese Swahili to English and French with BLEU scores 7.7 and 13.7 respectively, and the second best result for Coastal Swahili to English with BLEU score 14.9.