CLMar 16, 2023
SheffieldVeraAI at SemEval-2023 Task 3: Mono and multilingual approaches for news genre, topic and persuasion technique classificationBen Wu, Olesya Razuvayevskaya, Freddy Heppell et al.
This paper describes our approach for SemEval-2023 Task 3: Detecting the category, the framing, and the persuasion techniques in online news in a multi-lingual setup. For Subtask 1 (News Genre), we propose an ensemble of fully trained and adapter mBERT models which was ranked joint-first for German, and had the highest mean rank of multi-language teams. For Subtask 2 (Framing), we achieved first place in 3 languages, and the best average rank across all the languages, by using two separate ensembles: a monolingual RoBERTa-MUPPETLARGE and an ensemble of XLM-RoBERTaLARGE with adapters and task adaptive pretraining. For Subtask 3 (Persuasion Techniques), we train a monolingual RoBERTa-Base model for English and a multilingual mBERT model for the remaining languages, which achieved top 10 for all languages, including 2nd for English. For each subtask, we compared monolingual and multilingual approaches, and considered class imbalance techniques.
CLSep 14, 2023
Weakly Supervised Veracity Classification with LLM-Predicted Credibility SignalsJoão A. Leite, Olesya Razuvayevskaya, Kalina Bontcheva et al.
Credibility signals represent a wide range of heuristics typically used by journalists and fact-checkers to assess the veracity of online content. Automating the extraction of credibility signals presents significant challenges due to the necessity of training high-accuracy, signal-specific extractors, coupled with the lack of sufficiently large annotated datasets. This paper introduces Pastel (Prompted weAk Supervision wiTh crEdibility signaLs), a weakly supervised approach that leverages large language models (LLMs) to extract credibility signals from web content, and subsequently combines them to predict the veracity of content without relying on human supervision. We validate our approach using four article-level misinformation detection datasets, demonstrating that Pastel outperforms zero-shot veracity detection by 38.3% and achieves 86.7% of the performance of the state-of-the-art system trained with human supervision. Moreover, in cross-domain settings where training and testing datasets originate from different domains, Pastel significantly outperforms the state-of-the-art supervised model by 63%. We further study the association between credibility signals and veracity, and perform an ablation study showing the impact of each signal on model performance. Our findings reveal that 12 out of the 19 proposed signals exhibit strong associations with veracity across all datasets, while some signals show domain-specific strengths.
CLJul 31, 2023
Noisy Self-Training with Data Augmentations for Offensive and Hate Speech Detection TasksJoão A. Leite, Carolina Scarton, Diego F. Silva
Online social media is rife with offensive and hateful comments, prompting the need for their automatic detection given the sheer amount of posts created every second. Creating high-quality human-labelled datasets for this task is difficult and costly, especially because non-offensive posts are significantly more frequent than offensive ones. However, unlabelled data is abundant, easier, and cheaper to obtain. In this scenario, self-training methods, using weakly-labelled examples to increase the amount of training data, can be employed. Recent "noisy" self-training approaches incorporate data augmentation techniques to ensure prediction consistency and increase robustness against noisy data and adversarial attacks. In this paper, we experiment with default and noisy self-training using three different textual data augmentation techniques across five different pre-trained BERT architectures varying in size. We evaluate our experiments on two offensive/hate-speech datasets and demonstrate that (i) self-training consistently improves performance regardless of model size, resulting in up to +1.5% F1-macro on both datasets, and (ii) noisy self-training with textual data augmentations, despite being successfully applied in similar settings, decreases performance on offensive and hate-speech domains when compared to the default method, even with state-of-the-art augmentations such as backtranslation.
CLJan 23
LLM-Based Adversarial Persuasion Attacks on Fact-Checking SystemsJoão A. Leite, Olesya Razuvayevskaya, Kalina Bontcheva et al.
Automated fact-checking (AFC) systems are susceptible to adversarial attacks, enabling false claims to evade detection. Existing adversarial frameworks typically rely on injecting noise or altering semantics, yet no existing framework exploits the adversarial potential of persuasion techniques, which are widely used in disinformation campaigns to manipulate audiences. In this paper, we introduce a novel class of persuasive adversarial attacks on AFCs by employing a generative LLM to rephrase claims using persuasion techniques. Considering 15 techniques grouped into 6 categories, we study the effects of persuasion on both claim verification and evidence retrieval using a decoupled evaluation strategy. Experiments on the FEVER and FEVEROUS benchmarks show that persuasion attacks can substantially degrade both verification performance and evidence retrieval. Our analysis identifies persuasion techniques as a potent class of adversarial attacks, highlighting the need for more robust AFC systems.
CYDec 19, 2024
A Cross-Domain Study of the Use of Persuasion Techniques in Online DisinformationJoão A. Leite, Olesya Razuvayevskaya, Carolina Scarton et al.
Disinformation, irrespective of domain or language, aims to deceive or manipulate public opinion, typically through employing advanced persuasion techniques. Qualitative and quantitative research on the weaponisation of persuasion techniques in disinformation has been mostly topic-specific (e.g., COVID-19) with limited cross-domain studies, resulting in a lack of comprehensive understanding of these strategies. This study employs a state-of-the-art persuasion technique classifier to conduct a large-scale, multi-domain analysis of the role of 16 persuasion techniques in disinformation narratives. It shows how different persuasion techniques are employed disproportionately in different disinformation domains. We also include a detailed case study on climate change disinformation, highlighting how linguistic, psychological, and cultural factors shape the adaptation of persuasion strategies to fit unique thematic contexts.
CLOct 28, 2024
A Survey on Automatic Credibility Assessment Using Textual Credibility Signals in the Era of Large Language ModelsIvan Srba, Olesya Razuvayevskaya, João A. Leite et al.
In the age of social media and generative AI, the ability to automatically assess the credibility of online content has become increasingly critical, complementing traditional approaches to false information detection. Credibility assessment relies on aggregating diverse credibility signals - small units of information, such as content subjectivity, bias, or a presence of persuasion techniques - into a final credibility label/score. However, current research in automatic credibility assessment and credibility signals detection remains highly fragmented, with many signals studied in isolation and lacking integration. Notably, there is a scarcity of approaches that detect and aggregate multiple credibility signals simultaneously. These challenges are further exacerbated by the absence of a comprehensive and up-to-date overview of research works that connects these research efforts under a common framework and identifies shared trends, challenges, and open problems. In this survey, we address this gap by presenting a systematic and comprehensive literature review of 175 research papers, focusing on textual credibility signals within the field of Natural Language Processing (NLP), which undergoes a rapid transformation due to advancements in Large Language Models (LLMs). While positioning the NLP research into the the broader multidisciplinary landscape, we examine both automatic credibility assessment methods as well as the detection of nine categories of credibility signals. We provide an in-depth analysis of three key categories: 1) factuality, subjectivity and bias, 2) persuasion techniques and logical fallacies, and 3) check-worthy and fact-checked claims. In addition to summarising existing methods, datasets, and tools, we outline future research direction and emerging opportunities, with particular attention to evolving challenges posed by generative AI.
CLOct 14, 2025
A Multilingual, Large-Scale Study of the Interplay between LLM Safeguards, Personalisation, and DisinformationJoão A. Leite, Arnav Arora, Silvia Gargova et al.
Large Language Models (LLMs) can generate human-like disinformation, yet their ability to personalise such content across languages and demographics remains underexplored. This study presents the first large-scale, multilingual analysis of persona-targeted disinformation generation by LLMs. Employing a red teaming methodology, we prompt eight state-of-the-art LLMs with 324 false narratives and 150 demographic personas (combinations of country, generation, and political orientation) across four languages--English, Russian, Portuguese, and Hindi--resulting in AI-TRAITS, a comprehensive dataset of 1.6 million personalised disinformation texts. Results show that the use of even simple personalisation prompts significantly increases the likelihood of jailbreaks across all studied LLMs, up to 10 percentage points, and alters linguistic and rhetorical patterns that enhance narrative persuasiveness. Models such as Grok and GPT exhibited jailbreak rates and personalisation scores both exceeding 85%. These insights expose critical vulnerabilities in current state-of-the-art LLMs and offer a foundation for improving safety alignment and detection strategies in multilingual and cross-demographic contexts.
CLJun 18, 2024
EUvsDisinfo: A Dataset for Multilingual Detection of Pro-Kremlin Disinformation in News ArticlesJoão A. Leite, Olesya Razuvayevskaya, Kalina Bontcheva et al.
This work introduces EUvsDisinfo, a multilingual dataset of disinformation articles originating from pro-Kremlin outlets, along with trustworthy articles from credible / less biased sources. It is sourced directly from the debunk articles written by experts leading the EUvsDisinfo project. Our dataset is the largest to-date resource in terms of the overall number of articles and distinct languages. It also provides the largest topical and temporal coverage. Using this dataset, we investigate the dissemination of pro-Kremlin disinformation across different languages, uncovering language-specific patterns targeting certain disinformation topics. We further analyse the evolution of topic distribution over an eight-year period, noting a significant surge in disinformation content before the full-scale invasion of Ukraine in 2022. Lastly, we demonstrate the dataset's applicability in training models to effectively distinguish between disinformation and trustworthy content in multilingual settings.
CLOct 9, 2020
Toxic Language Detection in Social Media for Brazilian Portuguese: New Dataset and Multilingual AnalysisJoão A. Leite, Diego F. Silva, Kalina Bontcheva et al.
Hate speech and toxic comments are a common concern of social media platform users. Although these comments are, fortunately, the minority in these platforms, they are still capable of causing harm. Therefore, identifying these comments is an important task for studying and preventing the proliferation of toxicity in social media. Previous work in automatically detecting toxic comments focus mainly in English, with very few work in languages like Brazilian Portuguese. In this paper, we propose a new large-scale dataset for Brazilian Portuguese with tweets annotated as either toxic or non-toxic or in different types of toxicity. We present our dataset collection and annotation process, where we aimed to select candidates covering multiple demographic groups. State-of-the-art BERT models were able to achieve 76% macro-F1 score using monolingual data in the binary case. We also show that large-scale monolingual data is still needed to create more accurate models, despite recent advances in multilingual approaches. An error analysis and experiments with multi-label classification show the difficulty of classifying certain types of toxic comments that appear less frequently in our data and highlights the need to develop models that are aware of different categories of toxicity.