CL AI LGApr 5, 2023

Bengali Fake Review Detection using Semi-supervised Generative Adversarial Networks

Md. Tanvir Rouf Shawon, G. M. Shahariar, Faisal Muhammad Shah, Mohammad Shafiul Alam, Md. Shahriar Mahbub

arXiv:2304.02739v11.39 citationsh-index: 15

Originality Incremental advance

AI Analysis

It addresses the challenge of fake review detection for Bengali consumers and researchers, offering a solution for classification issues with limited labeled data, though it is incremental as it adapts existing methods to a specific domain.

This paper tackled the problem of detecting fake reviews in Bengali, a low-resource language, by proposing a semi-supervised GAN-LM architecture that achieved 83.59% accuracy and 84.89% F1-score with only 1024 annotated samples, outperforming other models by up to 10%.

This paper investigates the potential of semi-supervised Generative Adversarial Networks (GANs) to fine-tune pretrained language models in order to classify Bengali fake reviews from real reviews with a few annotated data. With the rise of social media and e-commerce, the ability to detect fake or deceptive reviews is becoming increasingly important in order to protect consumers from being misled by false information. Any machine learning model will have trouble identifying a fake review, especially for a low resource language like Bengali. We have demonstrated that the proposed semi-supervised GAN-LM architecture (generative adversarial network on top of a pretrained language model) is a viable solution in classifying Bengali fake reviews as the experimental results suggest that even with only 1024 annotated samples, BanglaBERT with semi-supervised GAN (SSGAN) achieved an accuracy of 83.59% and a f1-score of 84.89% outperforming other pretrained language models - BanglaBERT generator, Bangla BERT Base and Bangla-Electra by almost 3%, 4% and 10% respectively in terms of accuracy. The experiments were conducted on a manually labeled food review dataset consisting of total 6014 real and fake reviews collected from various social media groups. Researchers that are experiencing difficulty recognizing not just fake reviews but other classification issues owing to a lack of labeled data may find a solution in our proposed methodology.

View on arXiv PDF

Similar