CLAIJan 26, 2024

Airavata: Introducing Hindi Instruction-tuned LLM

arXiv:2401.15006v241 citations
Originality Synthesis-oriented
AI Analysis

This work addresses the problem of limited AI tools for Hindi speakers, though it is incremental as it builds on existing models and datasets.

The researchers tackled the lack of instruction-tuned large language models for Hindi by developing Airavata, an LLM fine-tuned on diverse Hindi datasets, and introduced the IndicInstruct dataset and evaluation benchmarks to support further research in Indic languages.

We announce the initial release of "Airavata," an instruction-tuned LLM for Hindi. Airavata was created by fine-tuning OpenHathi with diverse, instruction-tuning Hindi datasets to make it better suited for assistive tasks. Along with the model, we also share the IndicInstruct dataset, which is a collection of diverse instruction-tuning datasets to enable further research for Indic LLMs. Additionally, we present evaluation benchmarks and a framework for assessing LLM performance across tasks in Hindi. Currently, Airavata supports Hindi, but we plan to expand this to all 22 scheduled Indic languages. You can access all artifacts at https://ai4bharat.github.io/airavata.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes