CLNov 5, 2025
The Human Flourishing Geographic Index: A County-Level Dataset for the United States, 2013--2023Stefano M. Iacus, Devika Jain, Andrea Nasuto et al.
Quantifying human flourishing, a multidimensional construct including happiness, health, purpose, virtue, relationships, and financial stability, is critical for understanding societal well-being beyond economic indicators. Existing measures often lack fine spatial and temporal resolution. Here we introduce the Human Flourishing Geographic Index (HFGI), derived from analyzing approximately 2.6 billion geolocated U.S. tweets (2013-2023) using fine-tuned large language models to classify expressions across 48 indicators aligned with Harvard's Global Flourishing Study framework plus attitudes towards migration and perception of corruption. The dataset offers monthly and yearly county- and state-level indicators of flourishing-related discourse, validated to confirm that the measures accurately represent the underlying constructs and show expected correlations with established indicators. This resource enables multidisciplinary analyses of well-being, inequality, and social change at unprecedented resolution, offering insights into the dynamics of human flourishing as reflected in social media discourse across the United States over the past decade.
CLOct 31, 2024Code
Rethinking Scale: The Efficacy of Fine-Tuned Open-Source LLMs in Large-Scale Reproducible Social Science ResearchMarcello Carammia, Stefano Maria Iacus, Giuseppe Porro
Large Language Models (LLMs) are distinguished by their architecture, which dictates their parameter size and performance capabilities. Social scientists have increasingly adopted LLMs for text classification tasks, which are difficult to scale with human coders. While very large, closed-source models often deliver superior performance, their use presents significant risks. These include lack of transparency, potential exposure of sensitive data, challenges to replicability, and dependence on proprietary systems. Additionally, their high costs make them impractical for large-scale research projects. In contrast, open-source models, although available in various sizes, may underperform compared to commercial alternatives if used without further fine-tuning. However, open-source models offer distinct advantages: they can be run locally (ensuring data privacy), fine-tuned for specific tasks, shared within the research community, and integrated into reproducible workflows. This study demonstrates that small, fine-tuned open-source LLMs can achieve equal or superior performance to models such as ChatGPT-4. We further explore the relationship between training set size and fine-tuning efficacy in open-source models. Finally, we propose a hybrid workflow that leverages the strengths of both open and closed models, offering a balanced approach to performance, transparency, and reproducibility.
APDec 24, 2025
Dynamic Attention (DynAttn): Interpretable High-Dimensional Spatio-Temporal Forecasting (with Application to Conflict Fatalities)Stefano M. Iacus, Haodong Qi, Marcello Carammia et al.
Forecasting conflict-related fatalities remains a central challenge in political science and policy analysis due to the sparse, bursty, and highly non-stationary nature of violence data. We introduce DynAttn, an interpretable dynamic-attention forecasting framework for high-dimensional spatio-temporal count processes. DynAttn combines rolling-window estimation, shared elastic-net feature gating, a compact weight-tied self-attention encoder, and a zero-inflated negative binomial (ZINB) likelihood. This architecture produces calibrated multi-horizon forecasts of expected casualties and exceedance probabilities, while retaining transparent diagnostics through feature gates, ablation analysis, and elasticity measures. We evaluate DynAttn using global country-level and high-resolution PRIO-grid-level conflict data from the VIEWS forecasting system, benchmarking it against established statistical and machine-learning approaches, including DynENet, LSTM, Prophet, PatchTST, and the official VIEWS baseline. Across forecast horizons from one to twelve months, DynAttn consistently achieves substantially higher predictive accuracy, with particularly large gains in sparse grid-level settings where competing models often become unstable or degrade sharply. Beyond predictive performance, DynAttn enables structured interpretation of regional conflict dynamics. In our application, cross-regional analyses show that short-run conflict persistence and spatial diffusion form the core predictive backbone, while climate stress acts either as a conditional amplifier or a primary driver depending on the conflict theater.
APNov 9, 2020
Forecasting asylum-related migration flows with machine learning and data at scaleMarcello Carammia, Stefano Maria Iacus, Teddy Wilkin
The effects of the so-called "refugee crisis" of 2015-16 continue to dominate the political agenda in Europe. Migration flows were sudden and unexpected, leaving governments unprepared and exposing significant shortcomings in the field of migration forecasting. Migration is a complex system typified by episodic variation, underpinned by causal factors that are interacting, highly context dependent and short-lived. Correspondingly, migration monitoring relies on scattered data, while approaches to forecasting focus on specific migration flows and often have inconsistent results that are difficult to generalise at the regional or global levels. Here we show that adaptive machine learning algorithms that integrate official statistics and non-traditional data sources at scale can effectively forecast asylum-related migration flows. We focus on asylum applications lodged in countries of the European Union (EU) by nationals of all countries of origin worldwide; the same approach can be applied in any context provided adequate migration or asylum data are available. We exploit three tiers of data - geolocated events and internet searches in countries of origin, detections of irregular crossings at the EU border, and asylum recognition rates in countries of destination - to effectively forecast individual asylum-migration flows up to four weeks ahead with high accuracy. Uniquely, our approach a) monitors potential drivers of migration in countries of origin to detect changes early onset; b) models individual country-to-country migration flows separately and on moving time windows; c) estimates the effects of individual drivers, including lagged effects; d) provides forecasts of asylum applications up to four weeks ahead; e) assesses how patterns of drivers shift over time to describe the functioning and change of migration systems.