CLAug 13, 2024

Multilingual Models for Check-Worthy Social Media Posts Detection

arXiv:2408.06737v11.92 citationsh-index: 1

Originality Synthesis-oriented

AI Analysis

This addresses the problem of identifying check-worthy content for fact-checkers and moderators across multiple languages, including low-resource ones, but is incremental in nature.

The study tackled detecting verifiable factual and harmful claims in social media posts by developing multilingual transformer-based models, achieving results validated as robust against state-of-the-art models.

This work presents an extensive study of transformer-based NLP models for detection of social media posts that contain verifiable factual claims and harmful claims. The study covers various activities, including dataset collection, dataset pre-processing, architecture selection, setup of settings, model training (fine-tuning), model testing, and implementation. The study includes a comprehensive analysis of different models, with a special focus on multilingual models where the same model is capable of processing social media posts in both English and in low-resource languages such as Arabic, Bulgarian, Dutch, Polish, Czech, Slovak. The results obtained from the study were validated against state-of-the-art models, and the comparison demonstrated the robustness of the proposed models. The novelty of this work lies in the development of multi-label multilingual classification models that can simultaneously detect harmful posts and posts that contain verifiable factual claims in an efficient way.

View on arXiv PDF

Similar