CLLGNov 5, 2023

mahaNLP: A Marathi Natural Language Processing Library

arXiv:2311.02579v1127 citationsh-index: 2Has Code
Originality Synthesis-oriented
AI Analysis

This work addresses the problem of limited NLP tools for Marathi speakers and researchers, though it is incremental as it builds on existing transformer models and focuses on a specific domain.

The authors tackled the lack of comprehensive NLP support for the low-resource Marathi language by developing mahaNLP, an open-source library that offers a wide range of tasks from preprocessing to advanced functions like sentiment analysis and NER, built on state-of-the-art MahaBERT models.

We present mahaNLP, an open-source natural language processing (NLP) library specifically built for the Marathi language. It aims to enhance the support for the low-resource Indian language Marathi in the field of NLP. It is an easy-to-use, extensible, and modular toolkit for Marathi text analysis built on state-of-the-art MahaBERT-based transformer models. Our work holds significant importance as other existing Indic NLP libraries provide basic Marathi processing support and rely on older models with restricted performance. Our toolkit stands out by offering a comprehensive array of NLP tasks, encompassing both fundamental preprocessing tasks and advanced NLP tasks like sentiment analysis, NER, hate speech detection, and sentence completion. This paper focuses on an overview of the mahaNLP framework, its features, and its usage. This work is a part of the L3Cube MahaNLP initiative, more information about it can be found at https://github.com/l3cube-pune/MarathiNLP .

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes