CLFeb 2, 2021

Clickbait Headline Detection in Indonesian News Sites using Multilingual Bidirectional Encoder Representations from Transformers (M-BERT)

Muhammad N. Fakhruzzaman, Saidah Z. Jannah, Ratih A. Ningrum, Indah Fahmiyah

arXiv:2102.01497v116 citations

Originality Incremental advance

AI Analysis

This work provides a method for Indonesian news sites to detect clickbait headlines, which can help maintain the credibility of established news organizations.

This paper addresses the problem of clickbait headline detection in Indonesian news sites. The authors developed a classifier using M-BERT combined with a 100-node hidden layer and a sigmoid classifier, achieving an accuracy of 0.914, F1-score of 0.914, precision of 0.916, and ROC-AUC of 0.92 on a dataset of 6632 headlines.

Click counts are related to the amount of money that online advertisers paid to news sites. Such business models forced some news sites to employ a dirty trick of click-baiting, i.e., using a hyperbolic and interesting words, sometimes unfinished sentence in a headline to purposefully tease the readers. Some Indonesian online news sites also joined the party of clickbait, which indirectly degrade other established news sites' credibility. A neural network with a pre-trained language model M-BERT that acted as a embedding layer is then combined with a 100 nodes hidden layer and topped with a sigmoid classifier was trained to detect clickbait headlines. With a total of 6632 headlines as a training dataset, the classifier performed remarkably well. Evaluated with 5-fold cross validation, it has an accuracy score of 0.914, f1-score of 0.914, precision score of 0.916, and ROC-AUC of 0.92. The usage of multilingual BERT in Indonesian text classification task was tested and is possible to be enhanced further. Future possibilities, societal impact, and limitations of the clickbait detection are discussed.

View on arXiv PDF

Similar