CLMay 8, 2024

Fine-tuning Pre-trained Named Entity Recognition Models For Indian Languages

arXiv:2405.04829v232 citationsh-index: 24NAACL
Originality Synthesis-oriented
AI Analysis

This addresses the problem of low-resource NER for Indian languages, which is incremental as it applies existing fine-tuning methods to new data.

The paper tackles the limited research on Named Entity Recognition (NER) for Indian languages by creating a human-annotated corpus of 40K sentences for 4 languages and fine-tuning a multilingual model, which achieves an average F1 score of 0.80 on their dataset and comparable performance on unseen benchmarks.

Named Entity Recognition (NER) is a useful component in Natural Language Processing (NLP) applications. It is used in various tasks such as Machine Translation, Summarization, Information Retrieval, and Question-Answering systems. The research on NER is centered around English and some other major languages, whereas limited attention has been given to Indian languages. We analyze the challenges and propose techniques that can be tailored for Multilingual Named Entity Recognition for Indian Languages. We present a human annotated named entity corpora of 40K sentences for 4 Indian languages from two of the major Indian language families. Additionally,we present a multilingual model fine-tuned on our dataset, which achieves an F1 score of 0.80 on our dataset on average. We achieve comparable performance on completely unseen benchmark datasets for Indian languages which affirms the usability of our model.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes