Lorenzo Marinucci

h-index2

4papers

6citations

Novelty51%

AI Score46

Ranked #39,311 of 194,257 authors (top 20%)#489 in ML (top 14%)

4 Papers

11.8LGMay 10Code

SEMASIA: A Large-Scale Dataset of Semantically Structured Latent Representations

Mario Edoardo Pandolfo, Enrico Grimaldi, Lorenzo Marinucci et al.

Latent representations learned by neural networks often exhibit semantic structure, where concept similarity is reflected by geometric proximity in embedding space. However, comparing such spaces across models remains difficult: changes in architecture, pretraining data, objective, or random seed can yield embeddings with similar content but incompatible geometry. This latent space alignment problem is central to interpretability, transfer and multimodal learning, federated systems, and semantic communication; however, progress remains limited by the lack of large-scale, model-diverse, and metadata-rich benchmarks. To address this gap, we introduce SEMASIA, a large-scale collection of latent representations extracted from approximately 1,700 pretrained vision models across eight standard image-classification benchmarks. SEMASIA pairs embeddings with structured metadata describing architectures, training regimes, pretraining sources, and model scale. We demonstrate three applications of the resource. First, we analyze the conceptual organization of individual latent spaces, showing consistent prototype-like clustering and hierarchical semantic neighborhoods across models and datasets. Second, we benchmark supervised alignment mappings between latent spaces using reconstruction error and downstream task performance. Third, we perform a large-scale regression analysis of how pretraining-data complexity, specialization, transfer learning, augmentation, and model scale relate to geometric and probing properties of embeddings. By coupling representational scale with standardized metadata, SEMASIA provides a reproducible foundation for studying latent geometry, evaluating alignment methods, and developing next-generation heterogeneous and interoperable AI systems.

4.5MLDec 3, 2025

Colored Markov Random Fields for Probabilistic Topological Modeling

Lorenzo Marinucci, Leonardo Di Nino, Gabriele D'Acunto et al.

Probabilistic Graphical Models (PGMs) encode conditional dependencies among random variables using a graph -nodes for variables, links for dependencies- and factorize the joint distribution into lower-dimensional components. This makes PGMs well-suited for analyzing complex systems and supporting decision-making. Recent advances in topological signal processing highlight the importance of variables defined on topological spaces in several application domains. In such cases, the underlying topology shapes statistical relationships, limiting the expressiveness of canonical PGMs. To overcome this limitation, we introduce Colored Markov Random Fields (CMRFs), which model both conditional and marginal dependencies among Gaussian edge variables on topological spaces, with a theoretical foundation in Hodge theory. CMRFs extend classical Gaussian Markov Random Fields by including link coloring: connectivity encodes conditional independence, while color encodes marginal independence. We quantify the benefits of CMRFs through a distributed estimation case study over a physical network, comparing it with baselines with different levels of topological prior.

2.3SPMay 29, 2025

Topological Adaptive Least Mean Squares Algorithms over Simplicial Complexes

Lorenzo Marinucci, Claudio Battiloro, Paolo Di Lorenzo

This paper introduces a novel adaptive framework for processing dynamic flow signals over simplicial complexes, extending classical least-mean-squares (LMS) methods to high-order topological domains. Building on discrete Hodge theory, we present a topological LMS algorithm that efficiently processes streaming signals observed over time-varying edge subsets. We provide a detailed stochastic analysis of the algorithm, deriving its stability conditions, steady-state mean-square-error, and convergence speed, while exploring the impact of edge sampling on performance. We also propose strategies to design optimal edge sampling probabilities, minimizing rate while ensuring desired estimation accuracy. Assuming partial knowledge of the complex structure (e.g., the underlying graph), we introduce an adaptive topology inference method that integrates with the proposed LMS framework. Additionally, we propose a distributed version of the algorithm and analyze its stability and mean-square-error properties. Empirical results on synthetic and real-world traffic data demonstrate that our approach, in both centralized and distributed settings, outperforms graph-based LMS methods by leveraging higher-order topological features.

7.8MLOct 14, 2025

Simplicial Gaussian Models: Representation and Inference

Lorenzo Marinucci, Gabriele D'Acunto, Paolo Di Lorenzo et al.

Probabilistic graphical models (PGMs) are powerful tools for representing statistical dependencies through graphs in high-dimensional systems. However, they are limited to pairwise interactions. In this work, we propose the simplicial Gaussian model (SGM), which extends Gaussian PGM to simplicial complexes. SGM jointly models random variables supported on vertices, edges, and triangles, within a single parametrized Gaussian distribution. Our model builds upon discrete Hodge theory and incorporates uncertainty at every topological level through independent random components. Motivated by applications, we focus on the marginal edge-level distribution while treating node- and triangle-level variables as latent. We then develop a maximum-likelihood inference algorithm to recover the parameters of the full SGM and the induced conditional dependence structure. Numerical experiments on synthetic simplicial complexes with varying size and sparsity confirm the effectiveness of our algorithm.