Bemba Speech Translation: Exploring a Low-Resource African Language
This work addresses speech translation for Bemba, a low-resource African language, but appears incremental as it applies existing methods to a new language without novel breakthroughs.
The authors tackled Bemba-to-English speech translation, a low-resource task, by building cascaded systems with Whisper and NLLB-200 and using data augmentation like back-translation, achieving results submitted to IWSLT 2025 but without concrete performance numbers reported.
This paper describes our system submission to the International Conference on Spoken Language Translation (IWSLT 2025), low-resource languages track, namely for Bemba-to-English speech translation. We built cascaded speech translation systems based on Whisper and NLLB-200, and employed data augmentation techniques, such as back-translation. We investigate the effect of using synthetic data and discuss our experimental setup.