CLAug 29, 2023

Vulgar Remarks Detection in Chittagonian Dialect of Bangla

Tanjim Mahmud, Michal Ptaszynski, Fumito Masui

arXiv:2308.15448v11 citationsh-index: 16

Originality Synthesis-oriented

AI Analysis

This addresses online bullying and harassment for users of the Chittagonian dialect, but it is incremental as it applies existing NLP/ML methods to a new low-resource language.

The study tackled the problem of detecting vulgar remarks in the low-resource Chittagonian dialect of Bangla on social media, achieving promising results with Logistic Regression at 0.91 accuracy, while neural network methods like RNN with Word2vec and fastText showed lower accuracy between 0.84 and 0.90.

The negative effects of online bullying and harassment are increasing with Internet popularity, especially in social media. One solution is using natural language processing (NLP) and machine learning (ML) methods for the automatic detection of harmful remarks, but these methods are limited in low-resource languages like the Chittagonian dialect of Bangla.This study focuses on detecting vulgar remarks in social media using supervised ML and deep learning algorithms.Logistic Regression achieved promising accuracy (0.91) while simple RNN with Word2vec and fastTex had lower accuracy (0.84-0.90), highlighting the issue that NN algorithms require more data.

View on arXiv PDF

Similar