VERITAS-NLI : Validation and Extraction of Reliable Information Through Automated Scraping and Natural Language Inference
This addresses the problem of fake news detection for online platforms and society, offering a novel approach that reduces reliance on training data.
The paper tackles fake news detection by proposing a system that uses web-scraping and Natural Language Inference to verify headlines against external knowledge, achieving 84.3% accuracy and outperforming classical ML by 33.3% and BERT by 31.0%.
In today's day and age where information is rapidly spread through online platforms, the rise of fake news poses an alarming threat to the integrity of public discourse, societal trust, and reputed news sources. Classical machine learning and Transformer-based models have been extensively studied for the task of fake news detection, however they are hampered by their reliance on training data and are unable to generalize on unseen headlines. To address these challenges, we propose our novel solution, leveraging web-scraping techniques and Natural Language Inference (NLI) models to retrieve external knowledge necessary for verifying the accuracy of a headline. Our system is evaluated on a diverse self-curated evaluation dataset spanning over multiple news channels and broad domains. Our best performing pipeline achieves an accuracy of 84.3% surpassing the best classical Machine Learning model by 33.3% and Bidirectional Encoder Representations from Transformers (BERT) by 31.0% . This highlights the efficacy of combining dynamic web-scraping with Natural Language Inference to find support for a claimed headline in the corresponding externally retrieved knowledge for the task of fake news detection.