Letizia Iannucci

SI
3papers
28citations
Novelty20%
AI Score41

3 Papers

45.1SIMar 26
Beyond Disinformation: Strategic Misrepresentation across Content, Actors, Processes, and Covertness

Arttu Malkamäki, Daniel Balinhas, Letizia Iannucci et al.

This article revisits the widely studied problem of disinformation and related phenomena in online social networks (OSNs) by reframing it as a broader problem of misrepresentation. While disinformation is commonly understood as the intentional spread of false content, its meaning is applied inconsistently and often remains narrowly content-focused. This obscures other forms of manipulation, such as coordinated behavior that distorts the visibility, popularity or perceived legitimacy of actors and discourses without altering content itself. We argue that such limitations hinder a coherent and operational understanding of information campaigning in OSNs. To address this, we introduce strategic misrepresentation as a unifying concept capturing the interplay between content, actors and processes in shaping collective sensemaking. We formalize this concept through a four-dimensional framework encompassing content distortion, actor distortion, process distortion and covertness, reflecting how information campaigns unfold in practice and emphasizing observable behavioral signals. Building on this conceptualization, we conduct an integrative survey of state-of-the-art detection techniques across machine learning, network science and visual analytics. By synthesizing these approaches, we demonstrate how they jointly operationalize strategic misrepresentation in a data-driven manner. Our work provides a novel pragmatic foundation for detecting, classifying, and evaluating legitimate and illegitimate information campaigns within and across OSNs.

47.8SIApr 30Code
Social Media Data Toolkit: Standardization and Anonymization of Social Network Datasets

Ali Najafi, Letizia Iannucci, Mikko Kivelä et al.

The rapid diversification of social media platforms and the increasing restrictions on official APIs have significantly complicated cross-platform analysis. Researchers are often forced to rely on heterogeneous datasets obtained through web scraping and historical archives; however they often lack structural consistency. Prior to conducting cross-platform social media analyses, one needs to answer three critical questions: (1) What makes platforms different and similar? (2) How were the datasets collected? (3) How can we align the datasets of different platforms to conduct fair analyses? To address these questions, we introduce the Social Media Data Toolkit (\projectname{}), a comprehensive Python framework designed for the standardization, anonymization, and enrichment of social network datasets. \projectname{} unifies diverse data structures into a generic schema comprising Communities, Accounts, Posts, Actions, and Entities to facilitate multi-platform research. The framework features a configurable anonymization module to secure Personally Identifiable Information (PII) and an extendable enrichment layer that integrates Large Language Models (LLMs) and network analysis tools for downstream tasks such as stance detection and toxicity scoring without creating codebase for different datasets. We demonstrate the versatility of \projectname{} through four case studies spanning from textual analysis of the content to network analysis across platforms. To offer reproducible social media research, \projectname{} is released as an open-source tool featuring detailed documentation and practical guides for researchers at any skill-level. It can be accessed at github.com/ViralLab/SMDT and varollab.com/SMDT.

LGMay 19, 2023Code
TSGM: A Flexible Framework for Generative Modeling of Synthetic Time Series

Alexander Nikitin, Letizia Iannucci, Samuel Kaski

Temporally indexed data are essential in a wide range of fields and of interest to machine learning researchers. Time series data, however, are often scarce or highly sensitive, which precludes the sharing of data between researchers and industrial organizations and the application of existing and new data-intensive ML methods. A possible solution to this bottleneck is to generate synthetic data. In this work, we introduce Time Series Generative Modeling (TSGM), an open-source framework for the generative modeling of synthetic time series. TSGM includes a broad repertoire of machine learning methods: generative models, probabilistic, and simulator-based approaches. The framework enables users to evaluate the quality of the produced data from different angles: similarity, downstream effectiveness, predictive consistency, diversity, and privacy. The framework is extensible, which allows researchers to rapidly implement their own methods and compare them in a shareable environment. TSGM was tested on open datasets and in production and proved to be beneficial in both cases. Additionally to the library, the project allows users to employ command line interfaces for synthetic data generation which lowers the entry threshold for those without a programming background.