h-index36
6papers
30citations
Novelty53%
AI Score50

6 Papers

50.9CYApr 28
Dark Personality Traits and Online Toxicity: Linking Self-Reports to Reddit Activity

Aldo Cerulli, Benedetta Tessa, Giuseppe La Selva et al.

Dark personality traits have long been associated with antisocial and toxic online behaviors, yet their relationship with observable online activity remains unclear. We investigate the association between validated dark personality measures, self-reported experiences of online incivility, and linguistic and behavioral features extracted from real-world user activity. To this end, we developed a Web application that securely links responses to validated psychological questionnaires collected via Amazon Mechanical Turk with participants' Reddit activity. This yielded a dataset of nearly 57K comments (2.2M tokens) from 114 users, represented through a broad set of linguistic and behavioral features. Our analyses reveal a clear distinction between self-reported and observed behavior. Dark personality traits show consistent associations with self-reported engagement in uncivil interactions. However, no validated dark personality dimension significantly predicts text-derived toxicity or linguistic features. In contrast, self-reported experiences of engaging in or being targeted by toxic behavior are robustly reflected in users' language, exhibiting consistent associations with measures of negativity, moral framing, and emotional intensity. Taken together, these findings highlight a gap between stable personality traits and their manifestation in surface-level linguistic signals. While computational features effectively capture behavioral engagement in online incivility, they do not provide reliable proxies for underlying personality constructs within the present framework. Our results underscore the importance of grounding computational approaches in validated psychological measures and point to the need for richer, context-aware representations to better understand the relationship between personality and online behavior.

CYFeb 6
Assessing the Reliability of Persona-Conditioned LLMs as Synthetic Survey Respondents

Erika Elizabeth Taday Morocho, Lorenzo Cima, Tiziano Fagni et al.

Using persona-conditioned LLMs as synthetic survey respondents has become a common practice in computational social science and agent-based simulations. Yet, it remains unclear whether multi-attribute persona prompting improves LLM reliability or instead introduces distortions. Here we contribute to this assessment by leveraging a large dataset of U.S. microdata from the World Values Survey. Concretely, we evaluate two open-weight chat models and a random-guesser baseline across more than 70K respondent-item instances. We find that persona prompting does not yield a clear aggregate improvement in survey alignment and, in many cases, significantly degrades performance. Persona effects are highly heterogeneous as most items exhibit minimal change, while a small subset of questions and underrepresented subgroups experience disproportionate distortions. Our findings highlight a key adverse impact of current persona-based simulation practices: demographic conditioning can redistribute error in ways that undermine subgroup fidelity and risk misleading downstream analyses.

CLFeb 16
A Geometric Analysis of Small-sized Language Model Hallucinations

Emanuele Ricco, Elia Onofri, Lorenzo Cima et al.

Hallucinations -- fluent but factually incorrect responses -- pose a major challenge to the reliability of language models, especially in multi-step or agentic settings. This work investigates hallucinations in small-sized LLMs through a geometric perspective, starting from the hypothesis that when models generate multiple responses to the same prompt, genuine ones exhibit tighter clustering in the embedding space, we prove this hypothesis and, leveraging this geometrical insight, we also show that it is possible to achieve a consistent level of separability. This latter result is used to introduce a label-efficient propagation method that classifies large collections of responses from just 30-50 annotations, achieving F1 scores above 90%. Our findings, framing hallucinations from a geometric perspective in the embedding space, complement traditional knowledge-centric and single-response evaluation paradigms, paving the way for further research.

HCDec 10, 2024
Contextualized Counterspeech: Strategies for Adaptation, Personalization, and Evaluation

Lorenzo Cima, Alessio Miaschi, Amaury Trujillo et al.

AI-generated counterspeech offers a promising and scalable strategy to curb online toxicity through direct replies that promote civil discourse. However, current counterspeech is one-size-fits-all, lacking adaptation to the moderation context and the users involved. We propose and evaluate multiple strategies for generating tailored counterspeech that is adapted to the moderation context and personalized for the moderated user. We instruct an LLaMA2-13B model to generate counterspeech, experimenting with various configurations based on different contextual information and fine-tuning strategies. We identify the configurations that generate persuasive counterspeech through a combination of quantitative indicators and human evaluations collected via a pre-registered mixed-design crowdsourcing experiment. Results show that contextualized counterspeech can significantly outperform state-of-the-art generic counterspeech in adequacy and persuasiveness, without compromising other characteristics. Our findings also reveal a poor correlation between quantitative indicators and human evaluations, suggesting that these methods assess different aspects and highlighting the need for nuanced evaluation methodologies. The effectiveness of contextualized AI-generated counterspeech and the divergence between human and algorithmic evaluations underscore the importance of increased human-AI collaboration in content moderation.

CLFeb 10, 2025
Hallucination Detection: A Probabilistic Framework Using Embeddings Distance Analysis

Emanuele Ricco, Lorenzo Cima, Roberto Di Pietro

Hallucinations are one of the major issues affecting LLMs, hindering their wide adoption in production systems. While current research solutions for detecting hallucinations are mainly based on heuristics, in this paper we introduce a mathematically sound methodology to reason about hallucination, and leverage it to build a tool to detect hallucinations. To the best of our knowledge, we are the first to show that hallucinated content has structural differences with respect to correct content. To prove this result, we resort to the Minkowski distances in the embedding space. Our findings demonstrate statistically significant differences in the embedding distance distributions, that are also scale free -- they qualitatively hold regardless of the distance norm used and the number of keywords, questions, or responses. We leverage these structural differences to develop a tool to detect hallucinated responses, achieving an accuracy of 66\% for a specific configuration of system parameters -- comparable with the best results in the field. In conclusion, the suggested methodology is promising and novel, possibly paving the way for further research in the domain, also along the directions highlighted in our future work.

CVSep 18, 2025
PRISM: Phase-enhanced Radial-based Image Signature Mapping framework for fingerprinting AI-generated images

Emanuele Ricco, Elia Onofri, Lorenzo Cima et al.

A critical need has emerged for generative AI: attribution methods. That is, solutions that can identify the model originating AI-generated content. This feature, generally relevant in multimodal applications, is especially sensitive in commercial settings where users subscribe to paid proprietary services and expect guarantees about the source of the content they receive. To address these issues, we introduce PRISM, a scalable Phase-enhanced Radial-based Image Signature Mapping framework for fingerprinting AI-generated images. PRISM is based on a radial reduction of the discrete Fourier transform that leverages amplitude and phase information to capture model-specific signatures. The output of the above process is subsequently clustered via linear discriminant analysis to achieve reliable model attribution in diverse settings, even if the model's internal details are inaccessible. To support our work, we construct PRISM-36K, a novel dataset of 36,000 images generated by six text-to-image GAN- and diffusion-based models. On this dataset, PRISM achieves an attribution accuracy of 92.04%. We additionally evaluate our method on four benchmarks from the literature, reaching an average accuracy of 81.60%. Finally, we evaluate our methodology also in the binary task of detecting real vs fake images, achieving an average accuracy of 88.41%. We obtain our best result on GenImage with an accuracy of 95.06%, whereas the original benchmark achieved 82.20%. Our results demonstrate the effectiveness of frequency-domain fingerprinting for cross-architecture and cross-dataset model attribution, offering a viable solution for enforcing accountability and trust in generative AI systems.