Chiara Boldrini

LG
h-index26
17papers
78citations
Novelty38%
AI Score50

17 Papers

72.1SIJun 4
Annotation of Positive vs Negative User Interactions for Social Sign Prediction

Biancamaria Bombino, Chiara Boldrini, Andrea Passarella et al.

Inferring the sign of social relationships from online interactions is a fundamental challenge in social network analysis. Existing approaches typically rely on sentiment analysis to label individual interactions as positive or negative, then aggregate these labels to assign a sign to the relationship. However, sentiment analysis captures the valence of the content being discussed rather than the nature of the relational exchange itself, a conflation that can lead to systematic misclassification. In this paper, we propose a methodology that addresses this limitation by leveraging Large Language Models (LLMs) in a zero-shot setting to identify interaction-level relational signals (specifically, personal praise and personal attacks directed at the interlocutor) as more direct indicators of positive and negative social ties. We evaluate four models spanning open-weight and proprietary architectures (Qwen2.5:7b, Gemma2:9b, GPT-4o, GPT-5.4-mini) across three prompt designs of increasing complexity, on two human-annotated datasets of approximately 298 and 340 texts respectively. Results show that zero-shot LLMs achieve good classification performance on both tasks without any task-specific training data, establishing a practical baseline for relational annotation. Performance differs across tasks: attack detection is robust to prompt design and model choice, while praise detection is more sensitive to both, reflecting the greater subjectivity of positive relational gestures. These findings lay the groundwork for integrating LLM-based relational annotation into sign prediction pipelines.

61.3SIJun 1
Layered Ego Networks in Email Communication: From Enron to the Jmail Archive

Francesco Di Cursi, Chiara Boldrini, Marco Conti et al.

Email archives offer a rare view of social relationships through repeated communication, but it remains unclear how well classical ego network layering applies to digital interaction data. This paper compares two public email archives with sharply contrasting structures: Enron, a workplace corpus involving around 150 users, and Jmail, a single-ego archive centered on an exceptionally active focal actor whose communication volume is more than twenty times higher than the average Enron user. We ask, in each case, whether Dunbar-like layered organization is recoverable from email communication frequency and how it should be interpreted. For Jmail, we show that extreme communication intensity causes standard layering methods (whether clustering-based or threshold-based) to break down. Jmail is not a broad communication environment with many occasional contacts, but a selective pool of high-interest alters operating on a much higher frequency scale than ordinary email. Once the Dunbar frequency ladder is anchored to the empirical support-clique boundary, a clearer layered structure emerges. Reciprocity analysis confirms that the recovered layers reflect genuine bidirectional relationships rather than artifacts of the focal actor's outgoing activity. Enron serves as a workplace benchmark that grounds the comparison: its ego networks partially reproduce Dunbar-like organization, with stable inner circles and an outermost recovered layer corresponding to Dunbar's affinity group ($\sim50$), confirming that layered structure is recoverable from ordinary organizational email. Overall, the findings show that Dunbar-like organization can be meaningfully studied in email archives, but that selective high-frequency archives require frequency normalization before the layered structure becomes interpretable.

SIMar 1, 2022
Structural invariants and semantic fingerprints in the "ego network" of words

Kilian Ollivier, Chiara Boldrini, Andrea Passarella et al.

Well-established cognitive models coming from anthropology have shown that, due to the cognitive constraints that limit our "bandwidth" for social interactions, humans organize their social relations according to a regular structure. In this work, we postulate that similar regularities can be found in other cognitive processes, such as those involving language production. In order to investigate this claim, we analyse a dataset containing tweets of a heterogeneous group of Twitter users (regular users and professional writers). Leveraging a methodology similar to the one used to uncover the well-established social cognitive constraints, we find regularities at both the structural and semantic level. At the former, we find that a concentric layered structure (which we call ego network of words, in analogy to the ego network of social relationships) very well captures how individuals organise the words they use. The size of the layers in this structure regularly grows (approximately 2-3 times with respect to the previous one) when moving outwards, and the two penultimate external layers consistently account for approximately 60% and 30% of the used words, irrespective of the number of the total number of layers of the user. For the semantic analysis, each ring of each ego network is described by a semantic profile, which captures the topics associated with the words in the ring. We find that ring #1 has a special role in the model. It is semantically the most dissimilar and the most diverse among the rings. We also show that the topics that are important in the innermost ring also have the characteristic of being predominant in each of the other rings, as well as in the entire ego network. In this respect, ring #1 can be seen as the semantic fingerprint of the ego network of words.

LGJul 29, 2023
The effect of network topologies on fully decentralized learning: a preliminary investigation

Luigi Palmieri, Lorenzo Valerio, Chiara Boldrini et al.

In a decentralized machine learning system, data is typically partitioned among multiple devices or nodes, each of which trains a local model using its own data. These local models are then shared and combined to create a global model that can make accurate predictions on new data. In this paper, we start exploring the role of the network topology connecting nodes on the performance of a Machine Learning model trained through direct collaboration between nodes. We investigate how different types of topologies impact the "spreading of knowledge", i.e., the ability of nodes to incorporate in their local model the knowledge derived by learning patterns in data available in other nodes across the networks. Specifically, we highlight the different roles in this process of more or less connected nodes (hubs and leaves), as well as that of macroscopic network properties (primarily, degree distribution and modularity). Among others, we show that, while it is known that even weak connectivity among network components is sufficient for information spread, it may not be sufficient for knowledge spread. More intuitively, we also find that hubs have a more significant role than leaves in spreading knowledge, although this manifests itself not only for heavy-tailed distributions but also when "hubs" have only moderately more connections than leaves. Finally, we show that tightly knit communities severely hinder knowledge spread.

LGOct 4, 2023
Exploring the Impact of Disrupted Peer-to-Peer Communications on Fully Decentralized Learning in Disaster Scenarios

Luigi Palmieri, Chiara Boldrini, Lorenzo Valerio et al.

Fully decentralized learning enables the distribution of learning resources and decision-making capabilities across multiple user devices or nodes, and is rapidly gaining popularity due to its privacy-preserving and decentralized nature. Importantly, this crowdsourcing of the learning process allows the system to continue functioning even if some nodes are affected or disconnected. In a disaster scenario, communication infrastructure and centralized systems may be disrupted or completely unavailable, hindering the possibility of carrying out standard centralized learning tasks in these settings. Thus, fully decentralized learning can help in this case. However, transitioning from centralized to peer-to-peer communications introduces a dependency between the learning process and the topology of the communication graph among nodes. In a disaster scenario, even peer-to-peer communications are susceptible to abrupt changes, such as devices running out of battery or getting disconnected from others due to their position. In this study, we investigate the effects of various disruptions to peer-to-peer communications on decentralized learning in a disaster setting. We examine the resilience of a decentralized learning process when a subset of devices drop from the process abruptly. To this end, we analyze the difference between losing devices holding data, i.e., potential knowledge, vs. devices contributing only to the graph connectivity, i.e., with no data. Our findings on a Barabasi-Albert graph topology, where training data is distributed across nodes in an IID fashion, indicate that the accuracy of the learning process is more affected by a loss of connectivity than by a loss of data. Nevertheless, the network remains relatively robust, and the learning process can achieve a good level of accuracy.

37.0SIMar 16
Cascade-driven opinion dynamics on social networks

Elisabetta Biondi, Chiara Boldrini, Andrea Passarella et al.

Online social networks (OSNs) have transformed the way individuals fulfill their social needs and consume information. As OSNs become increasingly prominent sources for news dissemination, individuals often encounter content that influences their opinions through both direct interactions and broader network dynamics. In this paper, we propose the Friedkin-Johnsen on Cascade (FJC) model, which is, to the best of our knowledge, is the first attempt to integrate information cascades and opinion dynamics, specifically using the very popular Friedkin-Johnsen model. Our model, validated over real social cascades, highlights how the convergence of socialization and sharing news on these platforms can disrupt opinion evolution dynamics typically observed in offline settings. Our findings demonstrate that these cascades can amplify the influence of central opinion leaders, making them more resistant to divergent viewpoints, even when challenged by a critical mass of dissenting opinions. This research underscores the importance of understanding the interplay between social dynamics and information flow in shaping public discourse in the digital age.

LGJan 16
DecHW: Heterogeneous Decentralized Federated Learning Exploiting Second-Order Information

Adnan Ahmad, Chiara Boldrini, Lorenzo Valerio et al.

Decentralized Federated Learning (DFL) is a serverless collaborative machine learning paradigm where devices collaborate directly with neighbouring devices to exchange model information for learning a generalized model. However, variations in individual experiences and different levels of device interactions lead to data and model initialization heterogeneities across devices. Such heterogeneities leave variations in local model parameters across devices that leads to slower convergence. This paper tackles the data and model heterogeneity by explicitly addressing the parameter level varying evidential credence across local models. A novel aggregation approach is introduced that captures these parameter variations in local models and performs robust aggregation of neighbourhood local updates. Specifically, consensus weights are generated via approximation of second-order information of local models on their local datasets. These weights are utilized to scale neighbourhood updates before aggregating them into global neighbourhood representation. In extensive experiments with computer vision tasks, the proposed approach shows strong generalizability of local models at reduced communication costs.

CLNov 28, 2025Code
Mind Reading or Misreading? LLMs on the Big Five Personality Test

Francesco Di Cursi, Chiara Boldrini, Marco Conti et al.

We evaluate large language models (LLMs) for automatic personality prediction from text under the binary Five Factor Model (BIG5). Five models -- including GPT-4 and lightweight open-source alternatives -- are tested across three heterogeneous datasets (Essays, MyPersonality, Pandora) and two prompting strategies (minimal vs. enriched with linguistic and psychological cues). Enriched prompts reduce invalid outputs and improve class balance, but also introduce a systematic bias toward predicting trait presence. Performance varies substantially: Openness and Agreeableness are relatively easier to detect, while Extraversion and Neuroticism remain challenging. Although open-source models sometimes approach GPT-4 and prior benchmarks, no configuration yields consistently reliable predictions in zero-shot binary settings. Moreover, aggregate metrics such as accuracy and macro-F1 mask significant asymmetries, with per-class recall offering clearer diagnostic value. These findings show that current out-of-the-box LLMs are not yet suitable for APPT, and that careful coordination of prompt design, trait framing, and evaluation metrics is essential for interpretable results.

LGFeb 28, 2024
Impact of network topology on the performance of Decentralized Federated Learning

Luigi Palmieri, Chiara Boldrini, Lorenzo Valerio et al.

Fully decentralized learning is gaining momentum for training AI models at the Internet's edge, addressing infrastructure challenges and privacy concerns. In a decentralized machine learning system, data is distributed across multiple nodes, with each node training a local model based on its respective dataset. The local models are then shared and combined to form a global model capable of making accurate predictions on new data. Our exploration focuses on how different types of network structures influence the spreading of knowledge - the process by which nodes incorporate insights gained from learning patterns in data available on other nodes across the network. Specifically, this study investigates the intricate interplay between network structure and learning performance using three network topologies and six data distribution methods. These methods consider different vertex properties, including degree centrality, betweenness centrality, and clustering coefficient, along with whether nodes exhibit high or low values of these metrics. Our findings underscore the significance of global centrality metrics (degree, betweenness) in correlating with learning performance, while local clustering proves less predictive. We highlight the challenges in transferring knowledge from peripheral to central nodes, attributed to a dilution effect during model aggregation. Additionally, we observe that central nodes exert a pull effect, facilitating the spread of knowledge. In examining degree distribution, hubs in Barabasi-Albert networks positively impact learning for central nodes but exacerbate dilution when knowledge originates from peripheral nodes. Finally, we demonstrate the formidable challenge of knowledge circulation outside of segregated communities.

LGDec 7, 2023
Coordination-free Decentralised Federated Learning on Complex Networks: Overcoming Heterogeneity

Lorenzo Valerio, Chiara Boldrini, Andrea Passarella et al.

Federated Learning (FL) is a well-known framework for successfully performing a learning task in an edge computing scenario where the devices involved have limited resources and incomplete data representation. The basic assumption of FL is that the devices communicate directly or indirectly with a parameter server that centrally coordinates the whole process, overcoming several challenges associated with it. However, in highly pervasive edge scenarios, the presence of a central controller that oversees the process cannot always be guaranteed, and the interactions (i.e., the connectivity graph) between devices might not be predetermined, resulting in a complex network structure. Moreover, the heterogeneity of data and devices further complicates the learning process. This poses new challenges from a learning standpoint that we address by proposing a communication-efficient Decentralised Federated Learning (DFL) algorithm able to cope with them. Our solution allows devices communicating only with their direct neighbours to train an accurate model, overcoming the heterogeneity induced by data and different training histories. Our results show that the resulting local models generalise better than those trained with competing approaches, and do so in a more communication-efficient way.

LGMar 23, 2024
Initialisation and Network Effects in Decentralised Federated Learning

Arash Badie-Modiri, Chiara Boldrini, Lorenzo Valerio et al.

Fully decentralised federated learning enables collaborative training of individual machine learning models on a distributed network of communicating devices while keeping the training data localised on each node. This approach avoids central coordination, enhances data privacy and eliminates the risk of a single point of failure. Our research highlights that the effectiveness of decentralised federated learning is significantly influenced by the network topology of connected devices and the learning models' initial conditions. We propose a strategy for uncoordinated initialisation of the artificial neural networks based on the distribution of eigenvector centralities of the underlying communication network, leading to a radically improved training efficiency. Additionally, our study explores the scaling behaviour and the choice of environmental parameters under our proposed initialisation strategy. This work paves the way for more efficient and scalable artificial neural network training in a distributed and uncoordinated environment, offering a deeper understanding of the intertwining roles of network structure and learning dynamics.

LGMay 3, 2024
Robustness of Decentralised Learning to Nodes and Data Disruption

Luigi Palmieri, Chiara Boldrini, Lorenzo Valerio et al.

In the vibrant landscape of AI research, decentralised learning is gaining momentum. Decentralised learning allows individual nodes to keep data locally where they are generated and to share knowledge extracted from local data among themselves through an interactive process of collaborative refinement. This paradigm supports scenarios where data cannot leave local nodes due to privacy or sovereignty reasons or real-time constraints imposing proximity of models to locations where inference has to be carried out. The distributed nature of decentralised learning implies significant new research challenges with respect to centralised learning. Among them, in this paper, we focus on robustness issues. Specifically, we study the effect of nodes' disruption on the collective learning process. Assuming a given percentage of "central" nodes disappear from the network, we focus on different cases, characterised by (i) different distributions of data across nodes and (ii) different times when disruption occurs with respect to the start of the collaborative learning task. Through these configurations, we are able to show the non-trivial interplay between the properties of the network connecting nodes, the persistence of knowledge acquired collectively before disruption or lack thereof, and the effect of data availability pre- and post-disruption. Our results show that decentralised learning processes are remarkably robust to network disruption. As long as even minimum amounts of data remain available somewhere in the network, the learning process is able to recover from disruptions and achieve significant classification accuracy. This clearly varies depending on the remaining connectivity after disruption, but we show that even nodes that remain completely isolated can retain significant knowledge acquired before the disruption.

AIOct 9, 2025
DODO: Causal Structure Learning with Budgeted Interventions

Matteo Gregorini, Chiara Boldrini, Lorenzo Valerio

Artificial Intelligence has achieved remarkable advancements in recent years, yet much of its progress relies on identifying increasingly complex correlations. Enabling causality awareness in AI has the potential to enhance its performance by enabling a deeper understanding of the underlying mechanisms of the environment. In this paper, we introduce DODO, an algorithm defining how an Agent can autonomously learn the causal structure of its environment through repeated interventions. We assume a scenario where an Agent interacts with a world governed by a causal Directed Acyclic Graph (DAG), which dictates the system's dynamics but remains hidden from the Agent. The Agent's task is to accurately infer the causal DAG, even in the presence of noise. To achieve this, the Agent performs interventions, leveraging causal inference techniques to analyze the statistical significance of observed changes. Results show better performance for DODO, compared to observational approaches, in all but the most limited resource conditions. DODO is often able to reconstruct with as low as zero errors the structure of the causal graph. In the most challenging configuration, DODO outperforms the best baseline by +0.25 F1 points.

LGFeb 25, 2025
The Built-In Robustness of Decentralized Federated Averaging to Bad Data

Samuele Sabella, Chiara Boldrini, Lorenzo Valerio et al.

Decentralized federated learning (DFL) enables devices to collaboratively train models over complex network topologies without relying on a central controller. In this setting, local data remains private, but its quality and quantity can vary significantly across nodes. The extent to which a fully decentralized system is vulnerable to poor-quality or corrupted data remains unclear, but several factors could contribute to potential risks. Without a central authority, there can be no unified mechanism to detect or correct errors, and each node operates with a localized view of the data distribution, making it difficult for the node to assess whether its perspective aligns with the true distribution. Moreover, models trained on low-quality data can propagate through the network, amplifying errors. To explore the impact of low-quality data on DFL, we simulate two scenarios with degraded data quality -- one where the corrupted data is evenly distributed in a subset of nodes and one where it is concentrated on a single node -- using a decentralized implementation of FedAvg. Our results reveal that averaging-based decentralized learning is remarkably robust to localized bad data, even when the corrupted data resides in the most influential nodes of the network. Counterintuitively, this robustness is further enhanced when the corrupted data is concentrated on a single node, regardless of its centrality in the communication network topology. This phenomenon is explained by the averaging process, which ensures that no single node -- however central -- can disproportionately influence the overall learning process.

LGSep 20, 2021
Weak Signals in the Mobility Landscape: Car Sharing in Ten European Cities

Chiara Boldrini, Raffaele Bruno, Haitam Laarabi

Car sharing is one the pillars of a smart transportation infrastructure, as it is expected to reduce traffic congestion, parking demands and pollution in our cities. From the point of view of demand modelling, car sharing is a weak signal in the city landscape: only a small percentage of the population uses it, and thus it is difficult to study reliably with traditional techniques such as households travel diaries. In this work, we depart from these traditional approaches and we leverage web-based, digital records about vehicle availability in 10 European cities for one of the major active car sharing operators. We discuss which sociodemographic and urban activity indicators are associated with variations in car sharing demand, which forecasting approach (among the most popular in the related literature) is better suited to predict pickup and drop-off events, and how the spatio-temporal information about vehicle availability can be used to infer how different zones in a city are used by customers. We conclude the paper by presenting a direct application of the analysis of the dataset, aimed at identifying where to locate maintenance facilities within the car sharing operation area.

SISep 19, 2021
Harnessing the Power of Ego Network Layers for Link Prediction in Online Social Networks

Mustafa Toprak, Chiara Boldrini, Andrea Passarella et al.

Being able to recommend links between users in online social networks is important for users to connect with like-minded individuals as well as for the platforms themselves and third parties leveraging social media information to grow their business. Predictions are typically based on unsupervised or supervised learning, often leveraging simple yet effective graph topological information, such as the number of common neighbors. However, we argue that richer information about personal social structure of individuals might lead to better predictions. In this paper, we propose to leverage well-established social cognitive theories to improve link prediction performance. According to these theories, individuals arrange their social relationships along, on average, five concentric circles of decreasing intimacy. We postulate that relationships in different circles have different importance in predicting new links. In order to validate this claim, we focus on popular feature-extraction prediction algorithms (both unsupervised and supervised) and we extend them to include social-circles awareness. We validate the prediction performance of these circle-aware algorithms against several benchmarks (including their baseline versions as well as node-embedding- and GNN-based link prediction), leveraging two Twitter datasets comprising a community of video gamers and generic users. We show that social-awareness generally provides significant improvements in the prediction performance, beating also state-of-the-art solutions like node2vec and SEAL, and without increasing the computational complexity. Finally, we show that social-awareness can be used in place of using a classifier (which may be costly or impractical) for targeting a specific category of users.

CYJul 25, 2017
Car sharing through the data analysis lens

Chiara Boldrini, Raffaele Bruno, Haitam Laarabi

Car sharing is one the pillars of a smart transportation infrastructure, as it is expected to reduce traffic congestion, parking demands and pollution in our cities. From the point of view of demand modelling, car sharing is a weak signal in the city landscape: only a small percentage of the population uses it, and thus it is difficult to study reliably with traditional techniques such as households travel diaries. In this work, we depart from these traditional approaches and we rely on web-based, digital records about vehicle availability in 10 European cities for one of the major active car sharing operators. We discuss how vehicles are used, what are the main characteristics of car sharing trips, whether events happening in certain areas are predictable or not, and how the spatio-temporal information about vehicle availability can be used to infer how different zones in a city are used by customers. We conclude the paper by presenting a direct application of the analysis of the dataset, aimed at identifying where to locate maintenance facilities within the car sharing operational area.