CL LGNov 13, 2024

Bangla Grammatical Error Detection Leveraging Transformer-based Token Classification

Shayekh Bin Islam, Ridwanul Hasan Tanvir, Sihat Afnan

arXiv:2411.08344v11.02 citationsh-index: 3

Originality Synthesis-oriented

AI Analysis

This addresses the understudied problem of developing a grammar checker for Bangla, which is crucial for automated typing assistants in this widely spoken language, representing an incremental advancement.

The paper tackles automated Bangla grammatical error detection by framing it as a token classification problem using transformer-based models and rule-based post-processing, achieving a Levenshtein distance score of 1.04 on a dataset of over 25,000 texts.

Bangla is the seventh most spoken language by a total number of speakers in the world, and yet the development of an automated grammar checker in this language is an understudied problem. Bangla grammatical error detection is a task of detecting sub-strings of a Bangla text that contain grammatical, punctuation, or spelling errors, which is crucial for developing an automated Bangla typing assistant. Our approach involves breaking down the task as a token classification problem and utilizing state-of-the-art transformer-based models. Finally, we combine the output of these models and apply rule-based post-processing to generate a more reliable and comprehensive result. Our system is evaluated on a dataset consisting of over 25,000 texts from various sources. Our best model achieves a Levenshtein distance score of 1.04. Finally, we provide a detailed analysis of different components of our system.

View on arXiv PDF

Similar