MLMar 22, 2022
Learning curves for the multi-class teacher-student perceptronElisabetta Cornacchia, Francesca Mignacco, Rodrigo Veiga et al.
One of the most classical results in high-dimensional learning theory provides a closed-form expression for the generalisation error of binary classification with the single-layer teacher-student perceptron on i.i.d. Gaussian inputs. Both Bayes-optimal estimation and empirical risk minimisation (ERM) were extensively analysed for this setting. At the same time, a considerable part of modern machine learning practice concerns multi-class classification. Yet, an analogous analysis for the corresponding multi-class teacher-student perceptron was missing. In this manuscript we fill this gap by deriving and evaluating asymptotic expressions for both the Bayes-optimal and ERM generalisation errors in the high-dimensional regime. For Gaussian teacher weights, we investigate the performance of ERM with both cross-entropy and square losses, and explore the role of ridge regularisation in approaching Bayes-optimality. In particular, we observe that regularised cross-entropy minimisation yields close-to-optimal accuracy. Instead, for a binary teacher we show that a first-order phase transition arises in the Bayes-optimal performance.
STFeb 1, 2025
Analysis of Diffusion Models for Manifold DataAnand Jerry George, Rodrigo Veiga, Nicolas Macris
We analyze the time reversed dynamics of generative diffusion models. If the exact empirical score function is used in a regime of large dimension and exponentially large number of samples, these models are known to undergo transitions between distinct dynamical regimes. We extend this analysis and compute the transitions for an analytically tractable manifold model where the statistical model for the data is a mixture of lower dimensional Gaussians embedded in higher dimensional space. We compute the so-called speciation and collapse transition times, as a function of the ratio of manifold-to-ambient space dimensions, and other characteristics of the data model. An important tool used in our analysis is the exact formula for the mutual information (or free energy) of Generalized Linear Models.
LGFeb 1, 2025
Denoising Score Matching with Random Features: Insights on Diffusion Models from Precise Learning CurvesAnand Jerry George, Rodrigo Veiga, Nicolas Macris
We theoretically investigate the phenomena of generalization and memorization in diffusion models. Empirical studies suggest that these phenomena are influenced by model complexity and the size of the training dataset. In our experiments, we further observe that the number of noise samples per data sample ($m$) used during Denoising Score Matching (DSM) plays a significant and non-trivial role. We capture these behaviors and shed insights into their mechanisms by deriving asymptotically precise expressions for test and train errors of DSM under a simple theoretical setting. The score function is parameterized by random features neural networks, with the target distribution being $d$-dimensional Gaussian. We operate in a regime where the dimension $d$, number of data samples $n$, and number of features $p$ tend to infinity while keeping the ratios $ψ_n=\frac{n}{d}$ and $ψ_p=\frac{p}{d}$ fixed. By characterizing the test and train errors, we identify regimes of generalization and memorization as a function of $ψ_n,ψ_p$, and $m$. Our theoretical findings are consistent with the empirical observations.
MLFeb 12, 2024
Stochastic Gradient Flow Dynamics of Test Risk and its Exact Solution for Weak FeaturesRodrigo Veiga, Anastasia Remizova, Nicolas Macris
We investigate the test risk of continuous-time stochastic gradient flow dynamics in learning theory. Using a path integral formulation we provide, in the regime of a small learning rate, a general formula for computing the difference between test risk curves of pure gradient and stochastic gradient flows. We apply the general theory to a simple model of weak features, which displays the double descent phenomenon, and explicitly compute the corrections brought about by the added stochastic term in the dynamics, as a function of time and model parameters. The analytical results are compared to simulations of discrete-time stochastic gradient descent and show good agreement.
IRMay 7, 2023
Extracting Blockchain Concepts from TextRodrigo Veiga, Markus Endler, Valeria de Paiva
Blockchains provide a mechanism through which mutually distrustful remote parties can reach consensus on the state of a ledger of information. With the great acceleration with which this space is developed, the demand for those seeking to learn about blockchain also grows. Being a technical subject, it can be quite intimidating to start learning. For this reason, the main objective of this project was to apply machine learning models to extract information from whitepapers and academic articles focused on the blockchain area to organize this information and aid users to navigate the space.
MLFeb 1, 2022
Phase diagram of Stochastic Gradient Descent in high-dimensional two-layer neural networksRodrigo Veiga, Ludovic Stephan, Bruno Loureiro et al.
Despite the non-convex optimization landscape, over-parametrized shallow networks are able to achieve global convergence under gradient descent. The picture can be radically different for narrow networks, which tend to get stuck in badly-generalizing local minima. Here we investigate the cross-over between these two regimes in the high-dimensional setting, and in particular investigate the connection between the so-called mean-field/hydrodynamic regime and the seminal approach of Saad & Solla. Focusing on the case of Gaussian data, we study the interplay between the learning rate, the time scale, and the number of hidden units in the high-dimensional dynamics of stochastic gradient descent (SGD). Our work builds on a deterministic description of SGD in high-dimensions from statistical physics, which we extend and for which we provide rigorous convergence rates.
STAT-MECHJun 17, 2020
Restricted Boltzmann Machine Flows and The Critical Temperature of Ising modelsRodrigo Veiga, Renato Vicente
We explore alternative experimental setups for the iterative sampling (flow) from Restricted Boltzmann Machines (RBM) mapped on the temperature space of square lattice Ising models by a neural network thermometer. This framework has been introduced to explore connections between RBM-based deep neural networks and the Renormalization Group (RG). It has been found that, under certain conditions, the flow of an RBM trained with Ising spin configurations approaches in the temperature space a value around the critical one: $ k_B T_c / J \approx 2.269$. In this paper we consider datasets with no information about model topology to argue that a neural network thermometer is not an accurate way to detect whether the RBM has learned scale invariance or not.
SOC-PHJun 8, 2020
Age-structured estimation of COVID-19 ICU demand from low quality dataRodrigo Veiga, Rodrigo Murta, Renato Vicente
We sample aggravated cases following age-structured probabilities from confirmed cases and use ICU occupation data to find a subnotification factor. A logistic fit is then employed to project the progression of the COVID-19 epidemic with plateau scenarios taken from locations that have reached this stage. Finally, the logistic curve found is corrected by the subnotification factor and sampled to project the future demand for ICU beds.