LGJun 2, 2022
Learning a Restricted Boltzmann Machine using biased Monte Carlo samplingNicolas Béreux, Aurélien Decelle, Cyril Furtlehner et al.
Restricted Boltzmann Machines are simple and powerful generative models that can encode any complex dataset. Despite all their advantages, in practice the trainings are often unstable and it is difficult to assess their quality because the dynamics are affected by extremely slow time dependencies. This situation becomes critical when dealing with low-dimensional clustered datasets, where the time required to sample ergodically the trained models becomes computationally prohibitive. In this work, we show that this divergence of Monte Carlo mixing times is related to a phenomenon of phase coexistence, similar to that which occurs in physics near a first-order phase transition. We show that sampling the equilibrium distribution using the Markov chain Monte Carlo method can be dramatically accelerated when using biased sampling techniques, in particular the Tethered Monte Carlo (TMC) method. This sampling technique efficiently solves the problem of evaluating the quality of a given trained model and generating new samples in a reasonable amount of time. Moreover, we show that this sampling technique can also be used to improve the computation of the log-likelihood gradient during training, leading to dramatic improvements in training RBMs with artificial clustered datasets. On real low-dimensional datasets, this new training method fits RBM models with significantly faster relaxation dynamics than those obtained with standard PCD recipes. We also show that TMC sampling can be used to recover the free-energy profile of the RBM. This proves to be extremely useful to compute the probability distribution of a given model and to improve the generation of new decorrelated samples in slow PCD-trained models.
DIS-NNSep 5, 2023
Inferring effective couplings with Restricted Boltzmann MachinesAurélien Decelle, Cyril Furtlehner, Alfonso De Jesus Navas Gómez et al.
Generative models offer a direct way of modeling complex data. Energy-based models attempt to encode the statistical correlations observed in the data at the level of the Boltzmann weight associated with an energy function in the form of a neural network. We address here the challenge of understanding the physical interpretation of such models. In this study, we propose a simple solution by implementing a direct mapping between the Restricted Boltzmann Machine and an effective Ising spin Hamiltonian. This mapping includes interactions of all possible orders, going beyond the conventional pairwise interactions typically considered in the inverse Ising (or Boltzmann Machine) approach, and allowing the description of complex datasets. Earlier works attempted to achieve this goal, but the proposed mappings were inaccurate for inference applications, did not properly treat the complexity of the problem, or did not provide precise prescriptions for practical application. To validate our method, we performed several controlled inverse numerical experiments in which we trained the RBMs using equilibrium samples of predefined models with local external fields, 2-body and 3-body interactions in different sparse topologies. The results demonstrate the effectiveness of our proposed approach in learning the correct interaction network and pave the way for its application in modeling interesting binary variable datasets. We also evaluate the quality of the inferred model based on different training methods.
LGMay 24, 2024
Fast training and sampling of Restricted Boltzmann MachinesNicolas Béreux, Aurélien Decelle, Cyril Furtlehner et al.
Restricted Boltzmann Machines (RBMs) are effective tools for modeling complex systems and deriving insights from data. However, training these models with highly structured data presents significant challenges due to the slow mixing characteristics of Markov Chain Monte Carlo processes. In this study, we build upon recent theoretical advancements in RBM training, to significantly reduce the computational cost of training (in very clustered datasets), evaluating and sampling in RBMs in general. The learning process is analogous to thermodynamic continuous phase transitions observed in ferromagnetic models, where new modes in the probability measure emerge in a continuous manner. Such continuous transitions are associated with the critical slowdown effect, which adversely affects the accuracy of gradient estimates, particularly during the initial stages of training with clustered data. To mitigate this issue, we propose a pre-training phase that encodes the principal components into a low-rank RBM through a convex optimization process. This approach enables efficient static Monte Carlo sampling and accurate computation of the partition function. We exploit the continuous and smooth nature of the parameter annealing trajectory to achieve reliable and computationally efficient log-likelihood estimations, enabling online assessment during the training, and propose a novel sampling strategy named parallel trajectory tempering (PTT) which outperforms previously optimized MCMC methods. Our results show that this training strategy enables RBMs to effectively address highly structured datasets that conventional methods struggle with. We also provide evidence that our log-likelihood estimation is more accurate than traditional, more computationally intensive approaches in controlled scenarios. The PTT algorithm significantly accelerates MCMC processes compared to existing and conventional methods.
LGDec 14, 2024
ANaGRAM: A Natural Gradient Relative to Adapted Model for efficient PINNs learningNilo Schwencke, Cyril Furtlehner
In the recent years, Physics Informed Neural Networks (PINNs) have received strong interest as a method to solve PDE driven systems, in particular for data assimilation purpose. This method is still in its infancy, with many shortcomings and failures that remain not properly understood. In this paper we propose a natural gradient approach to PINNs which contributes to speed-up and improve the accuracy of the training. Based on an in depth analysis of the differential geometric structures of the problem, we come up with two distinct contributions: (i) a new natural gradient algorithm that scales as $\min(P^2S, S^2P)$, where $P$ is the number of parameters, and $S$ the batch size; (ii) a mathematically principled reformulation of the PINNs problem that allows the extension of natural gradient to it, with proved connections to Green's function theory.
LGJan 31, 2025
A theoretical framework for overfitting in energy-based modelingGiovanni Catania, Aurélien Decelle, Cyril Furtlehner et al.
We investigate the impact of limited data on training pairwise energy-based models for inverse problems aimed at identifying interaction networks. Utilizing the Gaussian model as testbed, we dissect training trajectories across the eigenbasis of the coupling matrix, exploiting the independent evolution of eigenmodes and revealing that the learning timescales are tied to the spectral decomposition of the empirical covariance matrix. We see that optimal points for early stopping arise from the interplay between these timescales and the initial conditions of training. Moreover, we show that finite data corrections can be accurately modeled through asymptotic random matrix theory calculations and provide the counterpart of generalized cross-validation in the energy based model context. Our analytical framework extends to binary-variable maximum-entropy pairwise models with minimal variations. These findings offer strategies to control overfitting in discrete-variable models through empirical shrinkage corrections, improving the management of overfitting in energy-based generative models. Finally, we propose a generalization to arbitrary energy-based models by deriving the neural tangent kernel dynamics of the score function under the score-matching algorithm.
DIS-NNJan 21
Learning and extrapolating scale-invariant processesAnaclara Alvez-Canepa, Cyril Furtlehner, François Landes
Machine Learning (ML) has deeply changed some fields recently, like Language and Vision and we may expect it to be relevant also to the analysis of of complex systems. Here we want to tackle the question of how and to which extent can one regress scale-free processes, i.e. processes displaying power law behavior, like earthquakes or avalanches? We are interested in predicting the large ones, i.e. rare events in the training set which therefore require extrapolation capabilities of the model. For this we consider two paradigmatic problems that are statistically self-similar. The first one is a 2-dimensional fractional Gaussian field obeying linear dynamics, self-similar by construction and amenable to exact analysis. The second one is the Abelian sandpile model, exhibiting self-organized criticality. The emerging paradigm of Geometric Deep Learning shows that including known symmetries into the model's architecture is key to success. Here one may hope to extrapolate only by leveraging scale invariance. This is however a peculiar symmetry, as it involves possibly non-trivial coarse-graining operations and anomalous scaling. We perform experiments on various existing architectures like U-net, Riesz network (scale invariant by construction), or our own proposals: a wavelet-decomposition based Graph Neural Network (with discrete scale symmetry), a Fourier embedding layer and a Fourier-Mellin Neural Operator. Based on these experiments and a complete characterization of the linear case, we identify the main issues relative to spectral biases and coarse-grained representations, and discuss how to alleviate them with the relevant inductive biases.
LGOct 28, 2025
PRIVET: Privacy Metric Based on Extreme Value TheoryAntoine Szatkownik, Aurélien Decelle, Beatriz Seoane et al.
Deep generative models are often trained on sensitive data, such as genetic sequences, health data, or more broadly, any copyrighted, licensed or protected content. This raises critical concerns around privacy-preserving synthetic data, and more specifically around privacy leakage, an issue closely tied to overfitting. Existing methods almost exclusively rely on global criteria to estimate the risk of privacy failure associated to a model, offering only quantitative non interpretable insights. The absence of rigorous evaluation methods for data privacy at the sample-level may hinder the practical deployment of synthetic data in real-world applications. Using extreme value statistics on nearest-neighbor distances, we propose PRIVET, a generic sample-based, modality-agnostic algorithm that assigns an individual privacy leak score to each synthetic sample. We empirically demonstrate that PRIVET reliably detects instances of memorization and privacy leakage across diverse data modalities, including settings with very high dimensionality, limited sample sizes such as genetic data and even under underfitting regimes. We compare our method to existing approaches under controlled settings and show its advantage in providing both dataset level and sample level assessments through qualitative and quantitative outputs. Additionally, our analysis reveals limitations in existing computer vision embeddings to yield perceptually meaningful distances when identifying near-duplicate samples.
LGOct 14, 2025
AMStraMGRAM: Adaptive Multi-cutoff Strategy Modification for ANaGRAMNilo Schwencke, Cyriaque Rousselot, Alena Shilova et al.
Recent works have shown that natural gradient methods can significantly outperform standard optimizers when training physics-informed neural networks (PINNs). In this paper, we analyze the training dynamics of PINNs optimized with ANaGRAM, a natural-gradient-inspired approach employing singular value decomposition with cutoff regularization. Building on this analysis, we propose a multi-cutoff adaptation strategy that further enhances ANaGRAM's performance. Experiments on benchmark PDEs validate the effectiveness of our method, which allows to reach machine precision on some experiments. To provide theoretical grounding, we develop a framework based on spectral theory that explains the necessity of regularization and extend previous shown connections with Green's functions theory.
STAT-MECHSep 9, 2025
Building causation links in stochastic nonlinear systems from dataSergio Chibbaro, Cyril Furtlehner, Théo Marchetta et al.
Causal relationships play a fundamental role in understanding the world around us. The ability to identify and understand cause-effect relationships is critical to making informed decisions, predicting outcomes, and developing effective strategies. However, deciphering causal relationships from observational data is a difficult task, as correlations alone may not provide definitive evidence of causality. In recent years, the field of machine learning (ML) has emerged as a powerful tool, offering new opportunities for uncovering hidden causal mechanisms and better understanding complex systems. In this work, we address the issue of detecting the intrinsic causal links of a large class of complex systems in the framework of the response theory in physics. We develop some theoretical ideas put forward by [1], and technically we use state-of-the-art ML techniques to build up models from data. We consider both linear stochastic and non-linear systems. Finally, we compute the asymptotic efficiency of the linear response based causal predictor in a case of large scale Markov process network of linear interactions.
LGMay 28, 2021
Equilibrium and non-Equilibrium regimes in the learning of Restricted Boltzmann MachinesAurélien Decelle, Cyril Furtlehner, Beatriz Seoane
Training Restricted Boltzmann Machines (RBMs) has been challenging for a long time due to the difficulty of computing precisely the log-likelihood gradient. Over the past decades, many works have proposed more or less successful training recipes but without studying the crucial quantity of the problem: the mixing time, i.e. the number of Monte Carlo iterations needed to sample new configurations from a model. In this work, we show that this mixing time plays a crucial role in the dynamics and stability of the trained model, and that RBMs operate in two well-defined regimes, namely equilibrium and out-of-equilibrium, depending on the interplay between this mixing time of the model and the number of steps, $k$, used to approximate the gradient. We further show empirically that this mixing time increases with the learning, which often implies a transition from one regime to another as soon as $k$ becomes smaller than this time. In particular, we show that using the popular $k$ (persistent) contrastive divergence approaches, with $k$ small, the dynamics of the learned model are extremely slow and often dominated by strong out-of-equilibrium effects. On the contrary, RBMs trained in equilibrium display faster dynamics, and a smooth convergence to dataset-like configurations during the sampling. Finally we discuss how to exploit in practice both regimes depending on the task one aims to fulfill: (i) short $k$ can be used to generate convincing samples in short learning times, (ii) large $k$ (or increasingly large) is needed to learn the correct equilibrium distribution of the RBM. Finally, the existence of these two operational regimes seems to be a general property of energy based models trained via likelihood maximization.
DIS-NNNov 23, 2020
Restricted Boltzmann Machine, recent advances and mean-field theoryAurélien Decelle, Cyril Furtlehner
This review deals with Restricted Boltzmann Machine (RBM) under the light of statistical physics. The RBM is a classical family of Machine learning (ML) models which played a central role in the development of deep learning. Viewing it as a Spin Glass model and exhibiting various links with other models of statistical physics, we gather recent results dealing with mean-field theory in this context. First the functioning of the RBM can be analyzed via the phase diagrams obtained for various statistical ensembles of RBM leading in particular to identify a {\it compositional phase} where a small number of features or modes are combined to form complex patterns. Then we discuss recent works either able to devise mean-field based learning algorithms; either able to reproduce generic aspects of the learning process from some {\it ensemble dynamics equations} or/and from linear stability arguments.
LGDec 19, 2019
Robust Multi-Output Learning with Highly Incomplete Data via Restricted Boltzmann MachinesGiancarlo Fissore, Aurélien Decelle, Cyril Furtlehner et al.
In a standard multi-output classification scenario, both features and labels of training data are partially observed. This challenging issue is widely witnessed due to sensor or database failures, crowd-sourcing and noisy communication channels in industrial data analytic services. Classic methods for handling multi-output classification with incomplete supervision information usually decompose the problem into an imputation stage that reconstructs the missing training information, and a learning stage that builds a classifier based on the imputed training set. These methods fail to fully leverage the dependencies between features and labels. In order to take full advantage of these dependencies we consider a purely probabilistic setting in which the features imputation and multi-label classification problems are jointly solved. Indeed, we show that a simple Restricted Boltzmann Machine can be trained with an adapted algorithm based on mean-field equations to efficiently solve problems of inductive and transductive learning in which both features and labels are missing at random. The effectiveness of the approach is demonstrated empirically on various datasets, with particular focus on a real-world Internet-of-Things security dataset.
DIS-NNOct 31, 2019
Gaussian-Spherical Restricted Boltzmann MachinesAurélien Decelle, Cyril Furtlehner
We consider a special type of Restricted Boltzmann machine (RBM), namely a Gaussian-spherical RBM where the visible units have Gaussian priors while the vector of hidden variables is constrained to stay on an ${\mathbbm L}_2$ sphere. The spherical constraint having the advantage to admit exact asymptotic treatments, various scaling regimes are explicitly identified based solely on the spectral properties of the coupling matrix (also called weight matrix of the RBM). Incidentally these happen to be formally related to similar scaling behaviours obtained in a different context dealing with spatial condensation of zero range processes. More specifically, when the spectrum of the coupling matrix is doubly degenerated an exact treatment can be proposed to deal with finite size effects. Interestingly the known parallel between the ferromagnetic transition of the spherical model and the Bose-Einstein condensation can be made explicit in that case. More importantly this gives us the ability to extract all needed response functions with arbitrary precision for the training algorithm of the RBM. This allows us then to numerically integrate the dynamics of the spectrum of the weight matrix during learning in a precise way. This dynamics reveals in particular a sequential emergence of modes from the Marchenko-Pastur bulk of singular vectors of the coupling matrix.
DIS-NNMar 5, 2018
Thermodynamics of Restricted Boltzmann Machines and related learning dynamicsAurélien Decelle, Giancarlo Fissore, Cyril Furtlehner
We investigate the thermodynamic properties of a Restricted Boltzmann Machine (RBM), a simple energy-based generative model used in the context of unsupervised learning. Assuming the information content of this model to be mainly reflected by the spectral properties of its weight matrix $W$, we try to make a realistic analysis by averaging over an appropriate statistical ensemble of RBMs. First, a phase diagram is derived. Otherwise similar to that of the Sherrington- Kirkpatrick (SK) model with ferromagnetic couplings, the RBM's phase diagram presents a ferromagnetic phase which may or may not be of compositional type depending on the kurtosis of the distribution of the components of the singular vectors of $W$. Subsequently, the learning dynamics of the RBM is studied in the thermodynamic limit. A "typical" learning trajectory is shown to solve an effective dynamical equation, based on the aforementioned ensemble average and explicitly involving order parameters obtained from the thermodynamic analysis. In particular, this let us show how the evolution of the dominant singular values of $W$, and thus of the unstable modes, is driven by the input data. At the beginning of the training, in which the RBM is found to operate in the linear regime, the unstable modes reflect the dominant covariance modes of the data. In the non-linear regime, instead, the selected modes interact and eventually impose a matching of the order parameters to their empirical counterparts estimated from the data. Finally, we illustrate our considerations by performing experiments on both artificial and real data, showing in particular how the RBM operates in the ferromagnetic compositional phase.
DIS-NNAug 9, 2017
Spectral Dynamics of Learning Restricted Boltzmann MachinesAurélien Decelle, Giancarlo Fissore, Cyril Furtlehner
The Restricted Boltzmann Machine (RBM), an important tool used in machine learning in particular for unsupervized learning tasks, is investigated from the perspective of its spectral properties. Starting from empirical observations, we propose a generic statistical ensemble for the weight matrix of the RBM and characterize its mean evolution. This let us show how in the linear regime, in which the RBM is found to operate at the beginning of the training, the statistical properties of the data drive the selection of the unstable modes of the weight matrix. A set of equations characterizing the non-linear regime is then derived, unveiling in some way how the selected modes interact in later stages of the learning procedure and defining a deterministic learning curve for the RBM.
PRDec 23, 2013
Using Latent Binary Variables for Online Reconstruction of Large Scale SystemsVictorin Martin, Jean-Marc Lasgouttes, Cyril Furtlehner
We propose a probabilistic graphical model realizing a minimal encoding of real variables dependencies based on possibly incomplete observation and an empirical cumulative distribution function per variable. The target application is a large scale partially observed system, like e.g. a traffic network, where a small proportion of real valued variables are observed, and the other variables have to be predicted. Our design objective is therefore to have good scalability in a real-time setting. Instead of attempting to encode the dependencies of the system directly in the description space, we propose a way to encode them in a latent space of binary variables, reflecting a rough perception of the observable (congested/non-congested for a traffic road). The method relies in part on message passing algorithms, i.e. belief propagation, but the core of the work concerns the definition of meaningful latent variables associated to the variables of interest and their pairwise dependencies. Numerical experiments demonstrate the applicability of the method in practice.
MLJun 27, 2013
Traffic data reconstruction based on Markov random field modelingShun Kataoka, Muneki Yasuda, Cyril Furtlehner et al.
We consider the traffic data reconstruction problem. Suppose we have the traffic data of an entire city that are incomplete because some road data are unobserved. The problem is to reconstruct the unobserved parts of the data. In this paper, we propose a new method to reconstruct incomplete traffic data collected from various traffic sensors. Our approach is based on Markov random field modeling of road traffic. The reconstruction is achieved by using mean-field method and a machine learning method. We numerically verify the performance of our method using realistic simulated traffic data for the real road network of Sendai, Japan.
DIS-NNOct 19, 2012
Pairwise MRF Calibration by Perturbation of the Bethe Reference PointCyril Furtlehner, Yufei Han, Jean-Marc Lasgouttes et al.
We investigate different ways of generating approximate solutions to the pairwise Markov random field (MRF) selection problem. We focus mainly on the inverse Ising problem, but discuss also the somewhat related inverse Gaussian problem because both types of MRF are suitable for inference tasks with the belief propagation algorithm (BP) under certain conditions. Our approach consists in to take a Bethe mean-field solution obtained with a maximum spanning tree (MST) of pairwise mutual information, referred to as the \emph{Bethe reference point}, for further perturbation procedures. We consider three different ways following this idea: in the first one, we select and calibrate iteratively the optimal links to be added starting from the Bethe reference point; the second one is based on the observation that the natural gradient can be computed analytically at the Bethe point; in the third one, assuming no local field and using low temperature expansion we develop a dual loop joint model based on a well chosen fundamental cycle basis. We indeed identify a subclass of planar models, which we refer to as \emph{Bethe-dual graph models}, having possibly many loops, but characterized by a singly connected dual factor graph, for which the partition function and the linear response can be computed exactly in respectively O(N) and $O(N^2)$ operations, thanks to a dual weight propagation (DWP) message passing procedure that we set up. When restricted to this subclass of models, the inverse Ising problem being convex, becomes tractable at any temperature. Experimental tests on various datasets with refined $L_0$ or $L_1$ regularization procedures indicate that these approaches may be competitive and useful alternatives to existing ones.
MLJul 19, 2012
Local stability of Belief Propagation algorithm with multiple fixed pointsVictorin Martin, Jean-Marc Lasgouttes, Cyril Furtlehner
A number of problems in statistical physics and computer science can be expressed as the computation of marginal probabilities over a Markov random field. Belief propagation, an iterative message-passing algorithm, computes exactly such marginals when the underlying graph is a tree. But it has gained its popularity as an efficient way to approximate them in the more general case, even if it can exhibits multiple fixed points and is not guaranteed to converge. In this paper, we express a new sufficient condition for local stability of a belief propagation fixed point in terms of the graph structure and the beliefs values at the fixed point. This gives credence to the usual understanding that Belief Propagation performs better on sparse graphs.