Virgilio Almeida

CL
6papers
43citations
Novelty26%
AI Score41

6 Papers

91.4CLMar 16Code
POLAR:A Per-User Association Test in Embedding Space

Pedro Bento, Arthur Buzelin, Arthur Chagas et al.

Most intrinsic association probes operate at the word, sentence, or corpus level, obscuring author-level variation. We present POLAR (Per-user On-axis Lexical Association Re-port), a per-user lexical association test that runs in the embedding space of a lightly adapted masked language model. Authors are represented by private deterministic to-kens; POLAR projects these vectors onto curated lexicalaxes and reports standardized effects with permutation p-values and Benjamini--Hochberg control. On a balanced bot--human Twitter benchmark, POLAR cleanly separates LLM-driven bots from organic accounts; on an extremist forum,it quantifies strong alignment with slur lexicons and reveals rightward drift over time. The method is modular to new attribute sets and provides concise, per-author diagnostics for computational social science. All code is publicly avail-able at https://github.com/pedroaugtb/POLAR-A-Per-User-Association-Test-in-Embedding-Space.

CLAug 9, 2024
Examining the Behavior of LLM Architectures Within the Framework of Standardized National Exams in Brazil

Marcelo Sartori Locatelli, Matheus Prado Miranda, Igor Joaquim da Silva Costa et al.

The Exame Nacional do Ensino Médio (ENEM) is a pivotal test for Brazilian students, required for admission to a significant number of universities in Brazil. The test consists of four objective high-school level tests on Math, Humanities, Natural Sciences and Languages, and one writing essay. Students' answers to the test and to the accompanying socioeconomic status questionnaire are made public every year (albeit anonymized) due to transparency policies from the Brazilian Government. In the context of large language models (LLMs), these data lend themselves nicely to comparing different groups of humans with AI, as we can have access to human and machine answer distributions. We leverage these characteristics of the ENEM dataset and compare GPT-3.5 and 4, and MariTalk, a model trained using Portuguese data, to humans, aiming to ascertain how their answers relate to real societal groups and what that may reveal about the model biases. We divide the human groups by using socioeconomic status (SES), and compare their answer distribution with LLMs for each question and for the essay. We find no significant biases when comparing LLM performance to humans on the multiple-choice Brazilian Portuguese tests, as the distance between model and human answers is mostly determined by the human accuracy. A similar conclusion is found by looking at the generated text as, when analyzing the essays, we observe that human and LLM essays differ in a few key factors, one being the choice of words where model essays were easily separable from human ones. The texts also differ syntactically, with LLM generated essays exhibiting, on average, smaller sentences and less thought units, among other differences. These results suggest that, for Brazilian Portuguese in the ENEM context, LLM outputs represent no group of humans, being significantly different from the answers from Brazilian students across all tests.

74.7SIApr 8
Characterizing AI Manipulation Risks in Brazilian YouTube Climate Discourse

Wenchao Dong, Marcelo S. Locatelli, Virgilio Almeida et al.

Climate change poses a global threat to public health, food security, and economic stability. Addressing it requires evidence-based policies and a nuanced understanding of how the threat is perceived by the public, particularly within visual social media, where narratives quickly evolve through voices of individuals, politicians, NGOs, and institutions. This study investigates climate-related discourse on YouTube within the Brazilian context, a geopolitically significant nation in global environmental negotiations. Through three case studies, we examine (1) which psychological content traits most effectively drive audience engagement, (2) the extent to which these traits influence content popularity, and (3) whether such insights can inform the design of persuasive synthetic campaigns--such as climate denialism--using recent generative language models. Another contribution of this work is the release of a large publicly available dataset of 226K Brazilian YouTube videos and 2.7M user comments on climate change. The dataset includes fine-grained annotations of persuasive strategies, theory-of-mind categorizations in user responses, and typologies of content creators. This resource can help support future research on digital climate communication and the ethical risk of algorithmically amplified narratives and generative media.

43.5SIApr 27
Mapping Emerging Climate Misinformation Playbooks in the Global South

Marcelo Sartori Locatelli, Wenchao Dong, Pedro Loures Alzamora et al.

Climate misinformation continues to erode support for climate action, a challenge that is especially acute in the Global South, where high climate vulnerability intersects with development pressures. In rapidly evolving digital ecosystems, misinformation adapts to platform incentives, shifting from overt rejection of climate science toward more subtle narratives that contest proposed solutions. This study integrates large-scale platform data with qualitative content analysis to examine how information systems shape contemporary climate discourse. Using a dataset of 226,775 climate-related YouTube videos from Brazil (2019-2025), we identify two dominant misinformation strategies: traditional denial that disputes scientific evidence and an emerging "new denial" that accepts climate change while undermining mitigation and adaptation policies. We find a pronounced transition to solution-focused narratives that target renewable energy, climate governance, and environmental advocates. New denial content is produced by a wider array of actors, attracts higher engagement, and employs more sophisticated persuasive techniques. These patterns disproportionately affect regions already facing structural inequities and bring broader concerns about platform accountability in unequal information environments and suggest the need for governance approaches capable of addressing new denial, a rapidly adapting form of harmful content that often evades existing moderation policies.

SIAug 17, 2018
Characterizing the public perception of WhatsApp through the lens of media

Josemar Alves Caetano, Gabriel Magno, Evandro Cunha et al.

WhatsApp is, as of 2018, a significant component of the global information and communication infrastructure, especially in developing countries. However, probably due to its strong end-to-end encryption, WhatsApp became an attractive place for the dissemination of misinformation, extremism and other forms of undesirable behavior. In this paper, we investigate the public perception of WhatsApp through the lens of media. We analyze two large datasets of news and show the kind of content that is being associated with WhatsApp in different regions of the world and over time. Our analyses include the examination of named entities, general vocabulary, and topics addressed in news articles that mention WhatsApp, as well as the polarity of these texts. Among other results, we demonstrate that the vocabulary and topics around the term "whatsapp" in the media have been changing over the years and in 2018 concentrate on matters related to misinformation, politics and criminal scams. More generally, our findings are useful to understand the impact that tools like WhatsApp play in the contemporary society and how they are seen by the communities themselves.

CLJul 18, 2018
Fake news as we feel it: perception and conceptualization of the term "fake news" in the media

Evandro Cunha, Gabriel Magno, Josemar Caetano et al.

In this article, we quantitatively analyze how the term "fake news" is being shaped in news media in recent years. We study the perception and the conceptualization of this term in the traditional media using eight years of data collected from news outlets based in 20 countries. Our results not only corroborate previous indications of a high increase in the usage of the expression "fake news", but also show contextual changes around this expression after the United States presidential election of 2016. Among other results, we found changes in the related vocabulary, in the mentioned entities, in the surrounding topics and in the contextual polarity around the term "fake news", suggesting that this expression underwent a change in perception and conceptualization after 2016. These outcomes expand the understandings on the usage of the term "fake news", helping to comprehend and more accurately characterize this relevant social phenomenon linked to misinformation and manipulation.