Muhammad Hazim Al Farouq

CL
h-index8
3papers
5citations
Novelty18%
AI Score23

3 Papers

CLFeb 17, 2025
SpeechT: Findings of the First Mentorship in Speech Translation

Yasmin Moslem, Juan Julián Cea Morán, Mariano Gonzalez-Gomez et al.

This work presents the details and findings of the first mentorship in speech translation (SpeechT), which took place in December 2024 and January 2025. To fulfil the mentorship requirements, the participants engaged in key activities, including data preparation, modelling, and advanced research. The participants explored data augmentation techniques and compared end-to-end and cascaded speech translation systems. The projects covered various languages other than English, including Arabic, Bengali, Galician, Indonesian, Japanese, and Spanish.

CLOct 26, 2025
Iterative Layer Pruning for Efficient Translation Inference

Yasmin Moslem, Muhammad Hazim Al Farouq, John D. Kelleher

Large language models (LLMs) have transformed many areas of natural language processing, including machine translation. However, efficient deployment of LLMs remains challenging due to their intensive computational requirements. In this paper, we address this challenge and present our submissions to the Model Compression track at the Conference on Machine Translation (WMT 2025). In our experiments, we investigate iterative layer pruning guided by layer importance analysis. We evaluate this method using the Aya-Expanse-8B model for translation from Czech to German, and from English to Egyptian Arabic. Our approach achieves substantial reductions in model size and inference time, while maintaining the translation quality of the baseline models.

CLMay 5, 2025
Bemba Speech Translation: Exploring a Low-Resource African Language

Muhammad Hazim Al Farouq, Aman Kassahun Wassie, Yasmin Moslem

This paper describes our system submission to the International Conference on Spoken Language Translation (IWSLT 2025), low-resource languages track, namely for Bemba-to-English speech translation. We built cascaded speech translation systems based on Whisper and NLLB-200, and employed data augmentation techniques, such as back-translation. We investigate the effect of using synthetic data and discuss our experimental setup.