MEMay 20Code
Z-Dip: a standardized measure for data modality assessmentEdoardo Di Martino, Matteo Cinelli, Roy Cerqueti
Detecting multimodality in empirical distributions is a fundamental problem in statistics and data analysis, with applications ranging from clustering to the study of complex systems. In practice, however, assessing departures from unimodality in a consistent and comparable way remains challenging. Widely used methods such as Hartigan and Hartigan's Dip Test illustrate these difficulties, as the interpretation of their statistics depends strongly on sample size, requires calibration to determine significance, and, for large samples, exhibit increasing sensitivity, leading to rejection of unimodality for arbitrarily small deviations from the null. We introduce Z-Dip, a standardized measure of multimodality that addresses these limitations. By treating the Dip statistic as a random variable under the null hypothesis of unimodality and standardizing its observed value, the proposed approach yields scores that are directly comparable across datasets of different sizes. Using simulation-based calibration, we derive a universal decision threshold that closely reproduces classical Dip Test decisions without requiring sample-size-specific adjustments. Extensive validation on simulated data and on more than 88,000 empirical opinion distributions shows near-perfect agreement with the classical Dip Test while providing a more interpretable and comparable measure of modality. Finally, we propose a downsampling-based correction that mitigates residual sensitivity in extremely large samples. Open-source software and reference tables are provided to facilitate practical adoption.
SIMay 29
Persistent Structural Inequality of Online Interactions Across PlatformsGiulio Pecile, Edoardo Di Martino, Edoardo Loru et al.
User interactions on social media platforms are unevenly distributed: a small subset of users consistently captures most of the activity, while the majority remains marginal. Although this pattern is well known and often described by power-law distributions, its consistency across time, platforms, and interaction types has not been systematically assessed. In this study, we analyze user-post bipartite networks from multiple social media platforms. We consider both active contributions (posts) and passive engagement (likes and comments), and quantify distributional properties and inequality using a KL-divergence-based model comparison, an inverse coefficient of variation, and a log-transformed Gini index. Our results show that interaction inequality remains stable over time within each platform. This holds across systems with different sizes, topical focuses, and governance models. These findings indicate that inequality in online engagement is not incidental but reflects structural constraints that shape how visibility and participation are distributed in digital environments.
CLJun 27, 2025
Involvement drives complexity of language in online debatesEleonora Amadori, Daniele Cirulli, Edoardo Di Martino et al.
Language is a fundamental aspect of human societies, continuously evolving in response to various stimuli, including societal changes and intercultural interactions. Technological advancements have profoundly transformed communication, with social media emerging as a pivotal force that merges entertainment-driven content with complex social dynamics. As these platforms reshape public discourse, analyzing the linguistic features of user-generated content is essential to understanding their broader societal impact. In this paper, we examine the linguistic complexity of content produced by influential users on Twitter across three globally significant and contested topics: COVID-19, COP26, and the Russia-Ukraine war. By combining multiple measures of textual complexity, we assess how language use varies along four key dimensions: account type, political leaning, content reliability, and sentiment. Our analysis reveals significant differences across all four axes, including variations in language complexity between individuals and organizations, between profiles with sided versus moderate political views, and between those associated with higher versus lower reliability scores. Additionally, profiles producing more negative and offensive content tend to use more complex language, with users sharing similar political stances and reliability levels converging toward a common jargon. Our findings offer new insights into the sociolinguistic dynamics of digital platforms and contribute to a deeper understanding of how language reflects ideological and social structures in online spaces.