Luca Pappalardo

AI
h-index63
22papers
1,384citations
Novelty39%
AI Score42

22 Papers

AIJun 23, 2023
Human-AI Coevolution

Dino Pedreschi, Luca Pappalardo, Emanuele Ferragina et al.

Human-AI coevolution, defined as a process in which humans and AI algorithms continuously influence each other, increasingly characterises our society, but is understudied in artificial intelligence and complexity science literature. Recommender systems and assistants play a prominent role in human-AI coevolution, as they permeate many facets of daily life and influence human choices on online platforms. The interaction between users and AI results in a potentially endless feedback loop, wherein users' choices generate data to train AI models, which, in turn, shape subsequent user preferences. This human-AI feedback loop has peculiar characteristics compared to traditional human-machine interaction and gives rise to complex and often ``unintended'' social outcomes. This paper introduces Coevolution AI as the cornerstone for a new field of study at the intersection between AI and complexity science focused on the theoretical, empirical, and mathematical investigation of the human-AI feedback loop. In doing so, we: (i) outline the pros and cons of existing methodologies and highlight shortcomings and potential ways for capturing feedback loop mechanisms; (ii) propose a reflection at the intersection between complexity science, AI and society; (iii) provide real-world examples for different human-AI ecosystems; and (iv) illustrate challenges to the creation of such a field of study, conceptualising them at increasing levels of abstraction, i.e., technical, epistemological, legal and socio-political.

IRFeb 18
The Diversity Paradox revisited: Systemic Effects of Feedback Loops in Recommender Systems

Gabriele Barlacchi, Margherita Lalli, Emanuele Ferragina et al.

Recommender systems shape individual choices through feedback loops in which user behavior and algorithmic recommendations coevolve over time. The systemic effects of these loops remain poorly understood, in part due to unrealistic assumptions in existing simulation studies. We propose a feedback-loop model that captures implicit feedback, periodic retraining, probabilistic adoption of recommendations, and heterogeneous recommender systems. We apply the framework on online retail and music streaming data and analyze systemic effects of the feedback loop. We find that increasing recommender adoption may lead to a progressive diversification of individual consumption, while collective demand is redistributed in model- and domain-dependent ways, often amplifying popularity concentration. Temporal analyses further reveal that apparent increases in individual diversity observed in static evaluations are illusory: when adoption is fixed and time unfolds, individual diversity consistently decreases across all models. Our results highlight the need to move beyond static evaluations and explicitly account for feedback-loop dynamics when designing recommender systems.

AIMar 7, 2022
Trajectory Test-Train Overlap in Next-Location Prediction Datasets

Massimiliano Luca, Luca Pappalardo, Bruno Lepri et al.

Next-location prediction, consisting of forecasting a user's location given their historical trajectories, has important implications in several fields, such as urban planning, geo-marketing, and disease spreading. Several predictors have been proposed in the last few years to address it, including last-generation ones based on deep learning. This paper tests the generalization capability of these predictors on public mobility datasets, stratifying the datasets by whether the trajectories in the test set also appear fully or partially in the training set. We consistently discover a severe problem of trajectory overlapping in all analyzed datasets, highlighting that predictors memorize trajectories while having limited generalization capacities. We thus propose a methodology to rerank the outputs of the next-location predictors based on spatial mobility patterns. With these techniques, we significantly improve the predictors' generalization capability, with a relative improvement on the accuracy up to 96.15% on the trajectories that cannot be memorized (i.e., low overlap with the training set).

CVMar 12, 2022
Enhancing crowd flow prediction in various spatial and temporal granularities

Marco Cardia, Massimiliano Luca, Luca Pappalardo

Thanks to the diffusion of the Internet of Things, nowadays it is possible to sense human mobility almost in real time using unconventional methods (e.g., number of bikes in a bike station). Due to the diffusion of such technologies, the last years have witnessed a significant growth of human mobility studies, motivated by their importance in a wide range of applications, from traffic management to public security and computational epidemiology. A mobility task that is becoming prominent is crowd flow prediction, i.e., forecasting aggregated incoming and outgoing flows in the locations of a geographic region. Although several deep learning approaches have been proposed to solve this problem, their usage is limited to specific types of spatial tessellations and cannot provide sufficient explanations of their predictions. We propose CrowdNet, a solution to crowd flow prediction based on graph convolutional networks. Compared with state-of-the-art solutions, CrowdNet can be used with regions of irregular shapes and provide meaningful explanations of the predicted crowd flows. We conduct experiments on public data varying the spatio-temporal granularity of crowd flows to show the superiority of our model with respect to existing methods, and we investigate CrowdNet's reliability to missing or noisy input data. Our model is a step forward in the design of reliable deep learning models to predict and explain human displacements in urban environments.

AIFeb 2
Position: Explaining Behavioral Shifts in Large Language Models Requires a Comparative Approach

Martino Ciaperoni, Marzio Di Vece, Luca Pappalardo et al.

Large-scale foundation models exhibit behavioral shifts: intervention-induced behavioral changes that appear after scaling, fine-tuning, reinforcement learning or in-context learning. While investigating these phenomena have recently received attention, explaining their appearance is still overlooked. Classic explainable AI (XAI) methods can surface failures at a single checkpoint of a model, but they are structurally ill-suited to justify what changed internally across different checkpoints and which explanatory claims are warranted about that change. We take the position that behavioral shifts should be explained comparatively: the core target should be the intervention-induced shift between a reference model and an intervened model, rather than any single model in isolation. To this aim we formulate a Comparative XAI ($Δ$-XAI) framework with a set of desiderata to be taken into account when designing proper explaining methods. To highlight how $Δ$-XAI methods work, we introduce a set of possible pipelines, relate them to the desiderata, and provide a concrete $Δ$-XAI experiment.

CYSep 26, 2024
Geospatial Road Cycling Race Results Data Set

Bram Janssens, Luca Pappalardo, Jelle De Bock et al.

The field of cycling analytics has only recently started to develop due to limited access to open data sources. Accordingly, research and data sources are very divergent, with large differences in information used across studies. To improve this, and facilitate further research in the field, we propose the publication of a data set which links thousands of professional race results from the period 2017-2023 to detailed geographic information about the courses, an essential aspect in road cycling analytics. Initial use cases are proposed, showcasing the usefulness in linking these two data sources.

AIApr 10, 2025
The Urban Impact of AI: Modeling Feedback Loops in Next-Venue Recommendation

Giovanni Mauro, Marco Minici, Luca Pappalardo

Next-venue recommender systems are increasingly embedded in location-based services, shaping individual mobility decisions in urban environments. While their predictive accuracy has been extensively studied, less attention has been paid to their systemic impact on urban dynamics. In this work, we introduce a simulation framework to model the human-AI feedback loop underpinning next-venue recommendation, capturing how algorithmic suggestions influence individual behavior, which in turn reshapes the data used to retrain the models. Our simulations, grounded in real-world mobility data, systematically explore the effects of algorithmic adoption across a range of recommendation strategies. We find that while recommender systems consistently increase individual-level diversity in visited venues, they may simultaneously amplify collective inequality by concentrating visits on a limited subset of popular places. This divergence extends to the structure of social co-location networks, revealing broader implications for urban accessibility and spatial segregation. Our framework operationalizes the feedback loop in next-venue recommendation and offers a novel lens through which to assess the societal impact of AI-assisted mobility-providing a computational tool to anticipate future risks, evaluate regulatory interventions, and inform the design of ethic algorithmic systems.

CLOct 16, 2024
Learning by Surprise: Surplexity for Mitigating Model Collapse in Generative AI

Daniele Gambetta, Gizem Gezici, Fosca Giannotti et al.

As synthetic content increasingly infiltrates the web, generative AI models may be retrained on their own outputs: a process termed "autophagy". This leads to model collapse: a progressive loss of performance and diversity across generations. Recent studies have examined the emergence of model collapse across various generative AI models and data types, and have proposed mitigation strategies that rely on incorporating human-authored content. However, current characterizations of model collapse remain limited, and existing mitigation methods assume reliable knowledge of whether training data is human-authored or AI-generated. In this paper, we address these gaps by introducing new measures that characterise collapse directly from a model's next-token probability distributions, rather than from properties of AI-generated text. Using these measures, we show that the degree of collapse depends on the complexity of the initial training set, as well as on the extent of autophagy. Our experiments prompt a new suggestion: that model collapse occurs when a model trains on data that does not "surprise" it. We express this hypothesis in terms of the well-known Free Energy Principle in cognitive science. Building on this insight, we propose a practical mitigation strategy: filtering training items by high surplexity, maximising the surprise of the model. Unlike existing methods, this approach does not require distinguishing between human- and AI-generated data. Experiments across datasets and models demonstrate that our strategy is at least as effective as human-data baselines, and even more effective in reducing distributional skewedness. Our results provide a richer understanding of model collapse and point toward more resilient approaches for training generative AI systems in environments increasingly saturated with synthetic data.

IRJun 29, 2024
A survey on the impacts of recommender systems on users, items, and human-AI ecosystems

Luca Pappalardo, Salvatore Citraro, Giuliano Cornacchia et al.

Recommendation systems and assistants (in short, recommenders) influence through online platforms most actions of our daily lives, suggesting items or providing solutions based on users' preferences or requests. This survey systematically reviews, categories, and discusses the impact of recommenders in four human-AI ecosystems -- social media, online retail, urban mapping and generative AI ecosystems. Its scope is to systematise a fast-growing field in which terminologies employed to classify methodologies and outcomes are fragmented and unsystematic. This is a crucial contribution to the literature because terminologies vary substantially across disciplines and ecosystems, hindering comparison and accumulation of knowledge in the field. We follow the customary steps of qualitative systematic review, gathering 154 articles from different disciplines to develop a parsimonious taxonomy of methodologies employed (empirical, simulation, observational, controlled), outcomes observed (concentration, content degradation, discrimination, diversity, echo chamber, filter bubble, homogenisation, polarisation, radicalisation, volume), and their level of analysis (individual, item, and ecosystem). We systematically discuss substantive and methodological commonalities across ecosystems, and highlight potential avenues for future research. The survey is addressed to scholars and practitioners interested in different human-AI ecosystems, policymakers and institutional stakeholders who want to understand better the measurable outcomes of recommenders, and tech companies who wish to obtain a systematic view of the impact of their recommenders.

LGFeb 22, 2022
Generating Synthetic Mobility Networks with Generative Adversarial Networks

Giovanni Mauro, Massimiliano Luca, Antonio Longa et al.

The increasingly crucial role of human displacements in complex societal phenomena, such as traffic congestion, segregation, and the diffusion of epidemics, is attracting the interest of scientists from several disciplines. In this article, we address mobility network generation, i.e., generating a city's entire mobility network, a weighted directed graph in which nodes are geographic locations and weighted edges represent people's movements between those locations, thus describing the entire mobility set flows within a city. Our solution is MoGAN, a model based on Generative Adversarial Networks (GANs) to generate realistic mobility networks. We conduct extensive experiments on public datasets of bike and taxi rides to show that MoGAN outperforms the classical Gravity and Radiation models regarding the realism of the generated networks. Our model can be used for data augmentation and performing simulations and what-if analysis.

AIJun 29, 2021
Coach2vec: autoencoding the playing style of soccer coaches

Paolo Cintia, Luca Pappalardo

Capturing the playing style of professional soccer coaches is a complex, and yet barely explored, task in sports analytics. Nowadays, the availability of digital data describing every relevant spatio-temporal aspect of soccer matches, allows for capturing and analyzing the playing style of players, teams, and coaches in an automatic way. In this paper, we present coach2vec, a workflow to capture the playing style of professional coaches using match event streams and artificial intelligence. Coach2vec extracts ball possessions from each match, clusters them based on their similarity, and reconstructs the typical ball possessions of coaches. Then, it uses an autoencoder, a type of artificial neural network, to obtain a concise representation (encoding) of the playing style of each coach. Our experiments, conducted on soccer-logs describing the last four seasons of the Italian first division, reveal interesting similarities between prominent coaches, paving the road to the simulation of playing styles and the quantitative comparison of professional coaches.

AIJun 1, 2021
Understanding peacefulness through the world news

Vasiliki Voukelatou, Ioanna Miliou, Fosca Giannotti et al.

Peacefulness is a principal dimension of well-being and is the way out of inequity and violence. Thus, its measurement has drawn the attention of researchers, policymakers, and peacekeepers. During the last years, novel digital data streams have drastically changed the research in this field. The current study exploits information extracted from a new digital database called Global Data on Events, Location, and Tone (GDELT) to capture peacefulness through the Global Peace Index (GPI). Applying predictive machine learning models, we demonstrate that news media attention from GDELT can be used as a proxy for measuring GPI at a monthly level. Additionally, we use explainable AI techniques to obtain the most important variables that drive the predictions. This analysis highlights each country's profile and provides explanations for the predictions, and particularly for the errors and the events that drive these errors. We believe that digital data exploited by researchers, policymakers, and peacekeepers, with data science tools as powerful as machine learning, could contribute to maximizing the societal benefits and minimizing the risks to peacefulness.

HCApr 16, 2021
An interactive dashboard for searching and comparing soccer performance scores

Paolo Cintia, Giovanni Mauro, Luca Pappalardo et al.

The performance of soccer players is one of most discussed aspects by many actors in the soccer industry: from supporters to journalists, from coaches to talent scouts. Unfortunately, the dashboards available online provide no effective way to compare the evolution of the performance of players or to find players behaving similarly on the field. This paper describes the design of a web dashboard that interacts via APIs with a performance evaluation algorithm and provides graphical tools that allow the user to perform many tasks, such as to search or compare players by age, role or trend of growth in their performance, find similar players based on their pitching behavior, change the algorithm's parameters to obtain customized performance scores. We also describe an example of how a talent scout can interact with the dashboard to find young, promising talents.

LGDec 4, 2020
A Survey on Deep Learning for Human Mobility

Massimiliano Luca, Gianni Barlacchi, Bruno Lepri et al.

The study of human mobility is crucial due to its impact on several aspects of our society, such as disease spreading, urban planning, well-being, pollution, and more. The proliferation of digital mobility data, such as phone records, GPS traces, and social media posts, combined with the predictive power of artificial intelligence, triggered the application of deep learning to human mobility. Existing surveys focus on single tasks, data sources, mechanistic or traditional machine learning approaches, while a comprehensive description of deep learning solutions is missing. This survey provides a taxonomy of mobility tasks, a discussion on the challenges related to each task and how deep learning may overcome the limitations of traditional models, a description of the most relevant solutions to the mobility tasks described above and the relevant challenges for the future. Our survey is a guide to the leading deep learning solutions to next-location prediction, crowd flow prediction, trajectory generation, and flow generation. At the same time, it helps deep learning scientists and practitioners understand the fundamental concepts and the open challenges of the study of human mobility.

LGDec 1, 2020
Deep Gravity: enhancing mobility flows generation with deep neural networks and geographic information

Filippo Simini, Gianni Barlacchi, Massimiliano Luca et al.

The movements of individuals within and among cities influence critical aspects of our society, such as well-being, the spreading of epidemics, and the quality of the environment. When information about mobility flows is not available for a particular region of interest, we must rely on mathematical models to generate them. In this work, we propose the Deep Gravity model, an effective method to generate flow probabilities that exploits many variables (e.g., land use, road network, transport, food, health facilities) extracted from voluntary geographic data, and uses deep neural networks to discover non-linear relationships between those variables and mobility flows. Our experiments, conducted on mobility flows in England, Italy, and New York State, show that Deep Gravity has good geographic generalization capability, achieving a significant increase in performance (especially in densely populated regions of interest) with respect to the classic gravity model and models that do not use deep neural networks or geographic data. We also show how flows generated by Deep Gravity may be explained in terms of the geographic features using explainable AI techniques.

CVJul 13, 2020
Automatic Pass Annotation from Soccer VideoStreams Based on Object Detection and LSTM

Danilo Sorano, Fabio Carrara, Paolo Cintia et al.

Soccer analytics is attracting increasing interest in academia and industry, thanks to the availability of data that describe all the spatio-temporal events that occur in each match. These events (e.g., passes, shots, fouls) are collected by human operators manually, constituting a considerable cost for data providers in terms of time and economic resources. In this paper, we describe PassNet, a method to recognize the most frequent events in soccer, i.e., passes, from video streams. Our model combines a set of artificial neural networks that perform feature extraction from video streams, object detection to identify the positions of the ball and the players, and classification of frame sequences as passes or not passes. We test PassNet on different scenarios, depending on the similarity of conditions to the match used for training. Our results show good classification results and significant improvement in the accuracy of pass detection with respect to baseline classifiers, even when the match's video conditions of the test and training sets are considerably different. PassNet is the first step towards an automated event annotation system that may break the time and the costs for event annotation, enabling data collections for minor and non-professional divisions, youth leagues and, in general, competitions whose matches are not currently annotated by data providers.

AIJun 26, 2018
Open the Black Box Data-Driven Explanation of Black Box Decision Systems

Dino Pedreschi, Fosca Giannotti, Riccardo Guidotti et al.

Black box systems for automated decision making, often based on machine learning over (big) data, map a user's features into a class or a score without exposing the reasons why. This is problematic not only for lack of transparency, but also for possible biases hidden in the algorithms, due to human prejudices and collection artifacts hidden in the training data, which may lead to unfair or wrong decisions. We introduce the local-to-global framework for black box explanation, a novel approach with promising early results, which paves the road for a wide spectrum of future developments along three dimensions: (i) the language for expressing explanations in terms of highly expressive logic-based rules, with a statistical and causal interpretation; (ii) the inference of local explanations aimed at revealing the logic of the decision adopted for a specific instance by querying and auditing the black box in the vicinity of the target instance; (iii), the bottom-up generalization of the many local explanations into simple global ones, with algorithms that optimize the quality and comprehensibility of explanations.

APFeb 14, 2018
PlayeRank: data-driven performance evaluation and player ranking in soccer via a machine learning approach

Luca Pappalardo, Paolo Cintia, Paolo Ferragina et al.

The problem of evaluating the performance of soccer players is attracting the interest of many companies and the scientific community, thanks to the availability of massive data capturing all the events generated during a match (e.g., tackles, passes, shots, etc.). Unfortunately, there is no consolidated and widely accepted metric for measuring performance quality in all of its facets. In this paper, we design and implement PlayeRank, a data-driven framework that offers a principled multi-dimensional and role-aware evaluation of the performance of soccer players. We build our framework by deploying a massive dataset of soccer-logs and consisting of millions of match events pertaining to four seasons of 18 prominent soccer competitions. By comparing PlayeRank to known algorithms for performance evaluation in soccer, and by exploiting a dataset of players' evaluations made by professional soccer scouts, we show that PlayeRank significantly outperforms the competitors. We also explore the ratings produced by {\sf PlayeRank} and discover interesting patterns about the nature of excellent performances and what distinguishes the top players from the others. At the end, we explore some applications of PlayeRank -- i.e. searching players and player versatility --- showing its flexibility and efficiency, which makes it worth to be used in the design of a scalable platform for soccer analytics.

SOC-PHDec 5, 2017
Human Perception of Performance

Luca Pappalardo, Paolo Cintia, Dino Pedreschi et al.

Humans are routinely asked to evaluate the performance of other individuals, separating success from failure and affecting outcomes from science to education and sports. Yet, in many contexts, the metrics driving the human evaluation process remain unclear. Here we analyse a massive dataset capturing players' evaluations by human judges to explore human perception of performance in soccer, the world's most popular sport. We use machine learning to design an artificial judge which accurately reproduces human evaluation, allowing us to demonstrate how human observers are biased towards diverse contextual features. By investigating the structure of the artificial judge, we uncover the aspects of the players' behavior which attract the attention of human judges, demonstrating that human evaluation is based on a noticeability heuristic where only feature values far from the norm are considered to rate an individual's performance.

MLMay 23, 2017
Effective injury forecasting in soccer with GPS training data and machine learning

Alessio Rossi, Luca Pappalardo, Paolo Cintia et al.

Injuries have a great impact on professional soccer, due to their large influence on team performance and the considerable costs of rehabilitation for players. Existing studies in the literature provide just a preliminary understanding of which factors mostly affect injury risk, while an evaluation of the potential of statistical models in forecasting injuries is still missing. In this paper, we propose a multi-dimensional approach to injury forecasting in professional soccer that is based on GPS measurements and machine learning. By using GPS tracking technology, we collect data describing the training workload of players in a professional soccer club during a season. We then construct an injury forecaster and show that it is both accurate and interpretable by providing a set of case studies of interest to soccer practitioners. Our approach opens a novel perspective on injury prevention, providing a set of simple and practical rules for evaluating and interpreting the complex relations between injury risk and training performance in professional soccer.

APMay 2, 2017
Quantifying the relation between performance and success in soccer

Luca Pappalardo, Paolo Cintia

The availability of massive data about sports activities offers nowadays the opportunity to quantify the relation between performance and success. In this study, we analyze more than 6,000 games and 10 million events in six European leagues and investigate this relation in soccer competitions. We discover that a team's position in a competition's final ranking is significantly related to its typical performance, as described by a set of technical features extracted from the soccer data. Moreover we find that, while victory and defeats can be explained by the team's performance during a game, it is difficult to detect draws by using a machine learning approach. We then simulate the outcomes of an entire season of each league only relying on technical data, i.e. excluding the goals scored, exploiting a machine learning model trained on data from past seasons. The simulation produces a team ranking (the PC ranking) which is close to the actual ranking, suggesting that a complex systems' view on soccer has the potential of revealing hidden patterns regarding the relation between performance and success.

SIJul 16, 2016
Data-driven generation of spatio-temporal routines in human mobility

Luca Pappalardo, Filippo Simini

The generation of realistic spatio-temporal trajectories of human mobility is of fundamental importance in a wide range of applications, such as the developing of protocols for mobile ad-hoc networks or what-if analysis in urban ecosystems. Current generative algorithms fail in accurately reproducing the individuals' recurrent schedules and at the same time in accounting for the possibility that individuals may break the routine during periods of variable duration. In this article we present DITRAS (DIary-based TRAjectory Simulator), a framework to simulate the spatio-temporal patterns of human mobility. DITRAS operates in two steps: the generation of a mobility diary and the translation of the mobility diary into a mobility trajectory. We propose a data-driven algorithm which constructs a diary generator from real data, capturing the tendency of individuals to follow or break their routine. We also propose a trajectory generator based on the concept of preferential exploration and preferential return. We instantiate DITRAS with the proposed diary and trajectory generators and compare the resulting algorithm with real data and synthetic data produced by other generative algorithms, built by instantiating DITRAS with several combinations of diary and trajectory generators. We show that the proposed algorithm reproduces the statistical properties of real trajectories in the most accurate way, making a step forward the understanding of the origin of the spatio-temporal patterns of human mobility.