IR CL LGNov 19, 2019

Automatic Detection of Satire in Bangla Documents: A CNN Approach Based on Hybrid Feature Extraction Model

Arnab Sen Sharma, Maruf Ahmed Mridul, Md Saiful Islam

arXiv:1911.11062v15.517 citations

Originality Synthesis-oriented

AI Analysis

This addresses the challenge of identifying ambiguous satirical content in Bangla for online communities, though it is incremental as it applies existing methods to a new language domain.

The paper tackled the problem of detecting satire in Bangla documents from online sources, achieving an accuracy of over 96% using a CNN with a hybrid Word2Vec and TF-IDF feature extraction model.

Widespread of satirical news in online communities is an ongoing trend. The nature of satires is so inherently ambiguous that sometimes it's too hard even for humans to understand whether it's actually satire or not. So, research interest has grown in this field. The purpose of this research is to detect Bangla satirical news spread in online news portals as well as social media. In this paper, we propose a hybrid technique for extracting features from text documents combining Word2Vec and TF-IDF. Using our proposed feature extraction technique, with standard CNN architecture we could detect whether a Bangla text document is satire or not with an accuracy of more than 96%.

View on arXiv PDF

Similar