CL CYOct 20, 2025

Transformer-Based Low-Resource Language Translation: A Study on Standard Bengali to Sylheti

arXiv:2510.18898v14.91 citationsh-index: 4

Originality Synthesis-oriented

AI Analysis

This work addresses machine translation for the low-resource language Sylheti, contributing to inclusive language technologies, but it is incremental as it applies existing methods to a new language pair.

The study tackled Bengali-to-Sylheti translation by fine-tuning multilingual Transformer models and comparing them with zero-shot LLMs, finding that fine-tuned models significantly outperformed LLMs, with mBART-50 achieving the highest translation adequacy and MarianMT showing the strongest character-level fidelity.

Machine Translation (MT) has advanced from rule-based and statistical methods to neural approaches based on the Transformer architecture. While these methods have achieved impressive results for high-resource languages, low-resource varieties such as Sylheti remain underexplored. In this work, we investigate Bengali-to-Sylheti translation by fine-tuning multilingual Transformer models and comparing them with zero-shot large language models (LLMs). Experimental results demonstrate that fine-tuned models significantly outperform LLMs, with mBART-50 achieving the highest translation adequacy and MarianMT showing the strongest character-level fidelity. These findings highlight the importance of task-specific adaptation for underrepresented languages and contribute to ongoing efforts toward inclusive language technologies.

View on arXiv PDF

Similar