Fabrizio Boncoraglio

ML
h-index7
4papers
7citations
Novelty55%
AI Score46

4 Papers

MLMar 26
The Rules-and-Facts Model for Simultaneous Generalization and Memorization in Neural Networks

Gabriele Farné, Fabrizio Boncoraglio, Lenka Zdeborová

A key capability of modern neural networks is their capacity to simultaneously learn underlying rules and memorize specific facts or exceptions. Yet, theoretical understanding of this dual capability remains limited. We introduce the Rules-and-Facts (RAF) model, a minimal solvable setting that enables precise characterization of this phenomenon by bridging two classical lines of work in the statistical physics of learning: the teacher-student framework for generalization and Gardner-style capacity analysis for memorization. In the RAF model, a fraction $1 - \varepsilon$ of training labels is generated by a structured teacher rule, while a fraction $\varepsilon$ consists of unstructured facts with random labels. We characterize when the learner can simultaneously recover the underlying rule - allowing generalization to new data - and memorize the unstructured examples. Our results quantify how overparameterization enables the simultaneous realization of these two objectives: sufficient excess capacity supports memorization, while regularization and the choice of kernel or nonlinearity control the allocation of capacity between rule learning and memorization. The RAF model provides a theoretical foundation for understanding how modern neural networks can infer structure while storing rare or non-compressible information.

LGJun 2, 2025
Bayes optimal learning of attention-indexed models

Fabrizio Boncoraglio, Emanuele Troiani, Vittorio Erba et al.

We introduce the attention-indexed model (AIM), a theoretical framework for analyzing learning in deep attention layers. Inspired by multi-index models, AIM captures how token-level outputs emerge from layered bilinear interactions over high-dimensional embeddings. Unlike prior tractable attention models, AIM allows full-width key and query matrices, aligning more closely with practical transformers. Using tools from statistical mechanics and random matrix theory, we derive closed-form predictions for Bayes-optimal generalization error and identify sharp phase transitions as a function of sample complexity, model width, and sequence length. We propose a matching approximate message passing algorithm and show that gradient descent can reach optimal performance. AIM offers a solvable playground for understanding learning in self-attention layers, that are key components of modern architectures.

MLSep 29, 2025
Inductive Bias and Spectral Properties of Single-Head Attention in High Dimensions

Fabrizio Boncoraglio, Vittorio Erba, Emanuele Troiani et al.

We study empirical risk minimization in a single-head tied-attention layer trained on synthetic high-dimensional sequence tasks, given by the recently introduced attention-indexed model. Using tools from random matrix theory, spin-glass physics, and approximate message passing, we derive sharp asymptotics for training and test errors, locate interpolation and recovery thresholds, and characterize the limiting spectral distribution of the learned weights. Weight decay induces an implicit nuclear-norm regularization, favoring low-rank query and key matrices. Leveraging this, we compare the standard factorized training of query and key matrices with a direct parameterization in which their product is trained element-wise, revealing the inductive bias introduced by the factorized form. Remarkably, the predicted spectral distribution echoes empirical trends reported in large-scale transformers, offering a theoretical perspective consistent with these phenomena.

MLSep 2, 2025
Inference in Spreading Processes with Neural-Network Priors

Davide Ghio, Fabrizio Boncoraglio, Lenka Zdeborová

Stochastic processes on graphs are a powerful tool for modelling complex dynamical systems such as epidemics. A recent line of work focused on the inference problem where one aims to estimate the state of every node at every time, starting from partial observation of a subset of nodes at a subset of times. In these works, the initial state of the process was assumed to be random i.i.d. over nodes. Such an assumption may not be realistic in practice, where one may have access to a set of covariate variables for every node that influence the initial state of the system. In this work, we will assume that the initial state of a node is an unknown function of such covariate variables. Given that functions can be represented by neural networks, we will study a model where the initial state is given by a simple neural network -- notably the single-layer perceptron acting on the known node-wise covariate variables. Within a Bayesian framework, we study how such neural-network prior information enhances the recovery of initial states and spreading trajectories. We derive a hybrid belief propagation and approximate message passing (BP-AMP) algorithm that handles both the spreading dynamics and the information included in the node covariates, and we assess its performance against the estimators that either use only the spreading information or use only the information from the covariate variables. We show that in some regimes, the model can exhibit first-order phase transitions when using a Rademacher distribution for the neural-network weights. These transitions create a statistical-to-computational gap where even the BP-AMP algorithm, despite the theoretical possibility of perfect recovery, fails to achieve it.