CLMar 14, 2025

TigerLLM - A Family of Bangla Large Language Models

arXiv:2503.10995v312 citationsh-index: 6Has CodeACL
Originality Incremental advance
AI Analysis

This addresses the linguistic disparity in LLMs for Bangla speakers, providing a new baseline for future development.

The authors tackled the lack of high-performance Bangla large language models by introducing TigerLLM, a family of models that surpass all open-source alternatives and outperform larger proprietary models like GPT3.5 across standard benchmarks.

The development of Large Language Models (LLMs) remains heavily skewed towards English and a few other high-resource languages. This linguistic disparity is particularly evident for Bangla - the 5th most spoken language. A few initiatives attempted to create open-source Bangla LLMs with performance still behind high-resource languages and limited reproducibility. To address this gap, we introduce TigerLLM - a family of Bangla LLMs. Our results demonstrate that these models surpass all open-source alternatives and also outperform larger proprietary models like GPT3.5 across standard benchmarks, establishing TigerLLM as the new baseline for future Bangla language modeling.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes