Benchmarking BERT-based Models for Sentence-level Topic Classification in Nepali Language
It addresses the underexplored problem of NLP for Nepali language users by providing benchmarks, but is incremental as it applies existing methods to new data.
This study benchmarked ten BERT-based models for sentence-level topic classification in Nepali, a low-resource language, finding that Indic models like MuRIL-large achieved the highest F1-score of 90.60%, with NepBERTa also performing competitively at 88.26%.
Transformer-based models such as BERT have significantly advanced Natural Language Processing (NLP) across many languages. However, Nepali, a low-resource language written in Devanagari script, remains relatively underexplored. This study benchmarks multilingual, Indic, Hindi, and Nepali BERT variants to evaluate their effectiveness in Nepali topic classification. Ten pre-trained models, including mBERT, XLM-R, MuRIL, DevBERT, HindiBERT, IndicBERT, and NepBERTa, were fine-tuned and tested on the balanced Nepali dataset containing 25,006 sentences across five conceptual domains and the performance was evaluated using accuracy, weighted precision, recall, F1-score, and AUROC metrics. The results reveal that Indic models, particularly MuRIL-large, achieved the highest F1-score of 90.60%, outperforming multilingual and monolingual models. NepBERTa also performed competitively with an F1-score of 88.26%. Overall, these findings establish a robust baseline for future document-level classification and broader Nepali NLP applications.