CLCYNov 24, 2025

LLMs for Low-Resource Dialect Translation Using Context-Aware Prompting: A Case Study on Sylheti

arXiv:2511.21761v12 citationsHas Code
Originality Incremental advance
AI Analysis

It addresses machine translation for low-resource dialects like Sylheti, which is incremental as it adapts existing LLM prompting methods to a specific domain.

This study tackled the problem of low-resource dialect translation by investigating LLM-based machine translation for Sylheti, a dialect of Bangla, and introduced Sylheti-CAP, a context-aware prompting framework that improved translation quality across models, with reductions in hallucinations and ambiguities confirmed by automatic metrics and human evaluations.

Large Language Models (LLMs) have demonstrated strong translation abilities through prompting, even without task-specific training. However, their effectiveness in dialectal and low-resource contexts remains underexplored. This study presents the first systematic investigation of LLM-based machine translation (MT) for Sylheti, a dialect of Bangla that is itself low-resource. We evaluate five advanced LLMs (GPT-4.1, GPT-4.1, LLaMA 4, Grok 3, and DeepSeek V3.2) across both translation directions (Bangla $\Leftrightarrow$ Sylheti), and find that these models struggle with dialect-specific vocabulary. To address this, we introduce Sylheti-CAP (Context-Aware Prompting), a three-step framework that embeds a linguistic rulebook, a dictionary (2{,}260 core vocabulary items and idioms), and an authenticity check directly into prompts. Extensive experiments show that Sylheti-CAP consistently improves translation quality across models and prompting strategies. Both automatic metrics and human evaluations confirm its effectiveness, while qualitative analysis reveals notable reductions in hallucinations, ambiguities, and awkward phrasing, establishing Sylheti-CAP as a scalable solution for dialectal and low-resource MT. Dataset link: \href{https://github.com/TabiaTanzin/LLMs-for-Low-Resource-Dialect-Translation-Using-Context-Aware-Prompting-A-Case-Study-on-Sylheti.git}{https://github.com/TabiaTanzin/LLMs-for-Low-Resource-Dialect-Translation-Using-Context-Aware-Prompting-A-Case-Study-on-Sylheti.git}

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes