Polite on the Surface, Wrong in Practice: A Curated Dataset for Fixing Honorific Failures in Multilingual Bangla Generation
For researchers and practitioners working on low-resource multilingual NLG, this work provides a targeted dataset and benchmark to fix culturally nuanced honorific errors, though the contribution is incremental.
The paper introduces BLADE, a curated dataset of 4,196 interaction pairs for instruction-tuning MLLMs to address honorific and pragmatic failures in Bangla generation. Fine-tuning DeepSeek-8B and LLaMA-3.2-3B with LoRA on this dataset yields substantial improvements in structural fidelity and honorific alignment.
Recent advances in Multilingual Large Language Models (MLLMs) have significantly enhanced cross-lingual conversational capabilities, yet modeling culturally nuanced and context-dependent communication remains a critical bottleneck. Specifically, existing state-of-the-art models exhibit a severe pragmatic gap when handling structural variations, regional idioms, and honorific consistencies in low-resource contexts like Bangla. To address this limitation, we introduce a novel, culturally aligned instruction-tuning dataset for \textbf{BangLa Application and DialoguE generation - BLADE} and benchmarking framework comprising $4,196$ meticulously curated interaction pairs. We leverage this resource to systematically fine-tune and evaluate leading open-weight architectures, including DeepSeek-8B and LLaMA-3.2-3B, utilizing parameter-efficient fine-tuning via LoRA adapters in a 4-bit NormalFloat (NF4) quantization framework. Our empirical evaluations demonstrate that models fine-tuned on our dataset yield substantial improvements in structural fidelity and honorific alignment, providing a rigorous benchmark for bridging pragmatic disparities in low-resource multilingual text generation. Code and dataset: https://github.com/ashuvo25/Bangla_Application_LLM/tree/main