CLLGNov 9, 2020

Bangla Text Classification using Transformers

arXiv:2011.04446v151 citations
AI Analysis

This work addresses text classification for Bangla, a resource-limited language, by applying existing transformer methods to new data, representing an incremental advancement.

The authors tackled Bangla text classification by fine-tuning multilingual transformer models across multiple domains, achieving state-of-the-art results with accuracy improvements of 5-29% on six benchmark datasets.

Text classification has been one of the earliest problems in NLP. Over time the scope of application areas has broadened and the difficulty of dealing with new areas (e.g., noisy social media content) has increased. The problem-solving strategy switched from classical machine learning to deep learning algorithms. One of the recent deep neural network architecture is the Transformer. Models designed with this type of network and its variants recently showed their success in many downstream natural language processing tasks, especially for resource-rich languages, e.g., English. However, these models have not been explored fully for Bangla text classification tasks. In this work, we fine-tune multilingual transformer models for Bangla text classification tasks in different domains, including sentiment analysis, emotion detection, news categorization, and authorship attribution. We obtain the state of the art results on six benchmark datasets, improving upon the previous results by 5-29% accuracy across different tasks.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes