CLCYOct 20, 2025

Transformer-Based Low-Resource Language Translation: A Study on Standard Bengali to Sylheti

arXiv:2510.18898v11 citationsh-index: 4
Originality Synthesis-oriented
AI Analysis

This work addresses machine translation for the low-resource language Sylheti, contributing to inclusive language technologies, but it is incremental as it applies existing methods to a new language pair.

The study tackled Bengali-to-Sylheti translation by fine-tuning multilingual Transformer models and comparing them with zero-shot LLMs, finding that fine-tuned models significantly outperformed LLMs, with mBART-50 achieving the highest translation adequacy and MarianMT showing the strongest character-level fidelity.

Machine Translation (MT) has advanced from rule-based and statistical methods to neural approaches based on the Transformer architecture. While these methods have achieved impressive results for high-resource languages, low-resource varieties such as Sylheti remain underexplored. In this work, we investigate Bengali-to-Sylheti translation by fine-tuning multilingual Transformer models and comparing them with zero-shot large language models (LLMs). Experimental results demonstrate that fine-tuned models significantly outperform LLMs, with mBART-50 achieving the highest translation adequacy and MarianMT showing the strongest character-level fidelity. These findings highlight the importance of task-specific adaptation for underrepresented languages and contribute to ongoing efforts toward inclusive language technologies.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes