DIS-NNSep 11, 2023
Stochastic Gradient Descent-like relaxation is equivalent to Metropolis dynamics in discrete optimization and inference problemsMaria Chiara Angelini, Angelo Giorgio Cavaliere, Raffaele Marino et al.
Is Stochastic Gradient Descent (SGD) substantially different from Metropolis Monte Carlo dynamics? This is a fundamental question at the time of understanding the most used training algorithm in the field of Machine Learning, but it received no answer until now. Here we show that in discrete optimization and inference problems, the dynamics of an SGD-like algorithm resemble very closely that of Metropolis Monte Carlo with a properly chosen temperature, which depends on the mini-batch size. This quantitative matching holds both at equilibrium and in the out-of-equilibrium regime, despite the two algorithms having fundamental differences (e.g.\ SGD does not satisfy detailed balance). Such equivalence allows us to use results about performances and limits of Monte Carlo algorithms to optimize the mini-batch size in the SGD-like algorithm and make it efficient at recovering the signal in hard inference problems.
DIS-NNNov 17, 2023
Stable Attractors for Neural networks classification via Ordinary Differential Equations (SA-nODE)Raffaele Marino, Lorenzo Giambagli, Lorenzo Chicchi et al.
A novel approach for supervised classification is presented which sits at the intersection of machine learning and dynamical systems theory. At variance with other methodologies that employ ordinary differential equations for classification purposes, the untrained model is a priori constructed to accommodate for a set of pre-assigned stationary stable attractors. Classifying amounts to steer the dynamics towards one of the planted attractors, depending on the specificity of the processed item supplied as an input. Asymptotically the system will hence converge on a specific point of the explored multi-dimensional space, flagging the category of the object to be eventually classified. Working in this context, the inherent ability to perform classification, as acquired ex post by the trained model, is ultimately reflected in the shaped basin of attractions associated to each of the target stable attractors. The performance of the proposed method is here challenged against simple toy models crafted for the purpose, as well as by resorting to well established reference standards. Although this method does not reach the performance of state-of-the-art deep learning algorithms, it illustrates that continuous dynamical systems with closed analytical interaction terms can serve as high-performance classifiers.
DIS-NNFeb 20Code
Benchmarking Graph Neural Networks in Solving Hard Constraint Satisfaction ProblemsGeri Skenderi, Lorenzo Buffoni, Francesco D'Amico et al.
Graph neural networks (GNNs) are increasingly applied to hard optimization problems, often claiming superiority over classical heuristics. However, such claims risk being unsolid due to a lack of standard benchmarks on truly hard instances. From a statistical physics perspective, we propose new hard benchmarks based on random problems. We provide these benchmarks, along with performance results from both classical heuristics and GNNs. Our fair comparison shows that classical algorithms still outperform GNNs. We discuss the challenges for neural networks in this domain. Future claims of superiority can be made more robust using our benchmarks, available at https://github.com/ArtLabBocconi/RandCSPBench.
DIS-NNMay 11
Exact Fixed-Point Constraints in Neural-ODEs with Provable UniversalityFeliciano Giuseppe Pacifico, Duccio Fanelli, Lorenzo Buffoni et al.
We introduce a technique that enables Neural-ODEs to approximate arbitrary velocity fields with a priori planted fixed-points. Specifically, a recipe is given to explicitly accommodate for a finite collection of points in the reference multi-dimensional space of the Neural-ODE where the velocity field is exactly equal to zero. In this way, the gradient-based training is rigorously constrained inside the prescribed hypothesis class while leaving the expressive power of the Neural-ODE unaltered. We rigorously prove the universality of the Neural-ODE under any local constraints in the velocity field and give a computationally convenient way of imposing the fixed points. Our method is then tested on two paradigmatic physical models.
CLSep 17, 2024
Fast Analysis of the OpenAI O1-Preview Model in Solving Random K-SAT Problem: Does the LLM Solve the Problem Itself or Call an External SAT Solver?Raffaele Marino
In this manuscript, I present an analysis on the performance of OpenAI O1-preview model in solving random K-SAT instances for K$\in {2,3,4}$ as a function of $α=M/N$ where $M$ is the number of clauses and $N$ is the number of variables of the satisfiable problem. I show that the model can call an external SAT solver to solve the instances, rather than solving them directly. Despite using external solvers, the model reports incorrect assignments as output. Moreover, I propose and present an analysis to quantify whether the OpenAI O1-preview model demonstrates a spark of intelligence or merely makes random guesses when outputting an assignment for a Boolean satisfiability problem.
AIMar 13, 2024
A Short Review on Novel Approaches for Maximum Clique Problem: from Classical algorithms to Graph Neural Networks and Quantum algorithmsRaffaele Marino, Lorenzo Buffoni, Bogdan Zavalnij
This manuscript provides a comprehensive review of the Maximum Clique Problem, a computational problem that involves finding subsets of vertices in a graph that are all pairwise adjacent to each other. The manuscript covers in a simple way classical algorithms for solving the problem and includes a review of recent developments in graph neural networks and quantum algorithms. The review concludes with benchmarks for testing classical as well as new learning, and quantum algorithms.
LGDec 12, 2023
Complex Recurrent Spectral NetworkLorenzo Chicchi, Lorenzo Giambagli, Lorenzo Buffoni et al.
This paper presents a novel approach to advancing artificial intelligence (AI) through the development of the Complex Recurrent Spectral Network ($\mathbb{C}$-RSN), an innovative variant of the Recurrent Spectral Network (RSN) model. The $\mathbb{C}$-RSN is designed to address a critical limitation in existing neural network models: their inability to emulate the complex processes of biological neural networks dynamically and accurately. By integrating key concepts from dynamical systems theory and leveraging principles from statistical mechanics, the $\mathbb{C}$-RSN model introduces localized non-linearity, complex fixed eigenvalues, and a distinct separation of memory and input processing functionalities. These features collectively enable the $\mathbb{C}$-RSN evolving towards a dynamic, oscillating final state that more closely mirrors biological cognition. Central to this work is the exploration of how the $\mathbb{C}$-RSN manages to capture the rhythmic, oscillatory dynamics intrinsic to biological systems, thanks to its complex eigenvalue structure and the innovative segregation of its linear and non-linear components. The model's ability to classify data through a time-dependent function, and the localization of information processing, is demonstrated with an empirical evaluation using the MNIST dataset. Remarkably, distinct items supplied as a sequential input yield patterns in time which bear the indirect imprint of the insertion order (and of the time of separation between contiguous insertions).
CLMar 13, 2024
The Garden of Forking Paths: Observing Dynamic Parameters Distribution in Large Language ModelsCarlo Nicolini, Jacopo Staiano, Bruno Lepri et al.
A substantial gap persists in understanding the reasons behind the exceptional performance of the Transformer architecture in NLP. A particularly unexplored area involves the mechanistic description of how the distribution of parameters evolves over time during training. In this work we suggest that looking at the time evolution of the statistic distribution of model parameters, and specifically at bifurcation effects, can help understanding the model quality, potentially reducing training costs and evaluation efforts and empirically showing the reasons behind the effectiveness of weights sparsification.
LGDec 22, 2023
Engineered Ordinary Differential Equations as Classification Algorithm (EODECA): thorough characterization and testingRaffaele Marino, Lorenzo Buffoni, Lorenzo Chicchi et al.
EODECA (Engineered Ordinary Differential Equations as Classification Algorithm) is a novel approach at the intersection of machine learning and dynamical systems theory, presenting a unique framework for classification tasks [1]. This method stands out with its dynamical system structure, utilizing ordinary differential equations (ODEs) to efficiently handle complex classification challenges. The paper delves into EODECA's dynamical properties, emphasizing its resilience against random perturbations and robust performance across various classification scenarios. Notably, EODECA's design incorporates the ability to embed stable attractors in the phase space, enhancing reliability and allowing for reversible dynamics. In this paper, we carry out a comprehensive analysis by expanding on the work [1], and employing a Euler discretization scheme. In particular, we evaluate EODECA's performance across five distinct classification problems, examining its adaptability and efficiency. Significantly, we demonstrate EODECA's effectiveness on the MNIST and Fashion MNIST datasets, achieving impressive accuracies of $98.06\%$ and $88.21\%$, respectively. These results are comparable to those of a multi-layer perceptron (MLP), underscoring EODECA's potential in complex data processing tasks. We further explore the model's learning journey, assessing its evolution in both pre and post training environments and highlighting its ability to navigate towards stable attractors. The study also investigates the invertibility of EODECA, shedding light on its decision-making processes and internal workings. This paper presents a significant step towards a more transparent and robust machine learning paradigm, bridging the gap between machine learning algorithms and dynamical systems methodologies.
NCJun 24, 2024
Learning in Wilson-Cowan model for metapopulationRaffaele Marino, Lorenzo Buffoni, Lorenzo Chicchi et al.
The Wilson-Cowan model for metapopulation, a Neural Mass Network Model, treats different subcortical regions of the brain as connected nodes, with connections representing various types of structural, functional, or effective neuronal connectivity between these regions. Each region comprises interacting populations of excitatory and inhibitory cells, consistent with the standard Wilson-Cowan model. By incorporating stable attractors into such a metapopulation model's dynamics, we transform it into a learning algorithm capable of achieving high image and text classification accuracy. We test it on MNIST and Fashion MNIST, in combination with convolutional neural networks, on CIFAR-10 and TF-FLOWERS, and, in combination with a transformer architecture (BERT), on IMDB, always showing high classification accuracy. These numerical evaluations illustrate that minimal modifications to the Wilson-Cowan model for metapopulation can reveal unique and previously unobserved dynamics.
LGJun 3, 2024
Estimating Global Input Relevance and Enforcing Sparse Representations with a Scalable Spectral Neural Network ApproachLorenzo Chicchi, Lorenzo Buffoni, Diego Febbe et al.
In machine learning practice it is often useful to identify relevant input features. Isolating key input elements, ranked according their respective degree of relevance, can help to elaborate on the process of decision making. Here, we propose a novel method to estimate the relative importance of the input components for a Deep Neural Network. This is achieved by leveraging on a spectral re-parametrization of the optimization process. Eigenvalues associated to input nodes provide in fact a robust proxy to gauge the relevance of the supplied entry features. Notably, the spectral features ranking is performed automatically, as a byproduct of the network training, with no additional processing to be carried out. Moreover, by leveraging on the regularization of the eigenvalues, it is possible to enforce solutions making use of a minimum subset of the input components, increasing the explainability of the model and providing sparse input representations. The technique is compared to the most common methods in the literature and is successfully challenged against both synthetic and real data.
DIS-NNMay 10, 2023
Phase transitions in the mini-batch size for sparse and dense two-layer neural networksRaffaele Marino, Federico Ricci-Tersenghi
The use of mini-batches of data in training artificial neural networks is nowadays very common. Despite its broad usage, theories explaining quantitatively how large or small the optimal mini-batch size should be are missing. This work presents a systematic attempt at understanding the role of the mini-batch size in training two-layer neural networks. Working in the teacher-student scenario, with a sparse teacher, and focusing on tasks of different complexity, we quantify the effects of changing the mini-batch size $m$. We find that often the generalization performances of the student strongly depend on $m$ and may undergo sharp phase transitions at a critical value $m_c$, such that for $m<m_c$ the training process fails, while for $m>m_c$ the student learns perfectly or generalizes very well the teacher. Phase transitions are induced by collective phenomena firstly discovered in statistical mechanics and later observed in many fields of science. Observing a phase transition by varying the mini-batch size across different architectures raises several questions about the role of this hyperparameter in the neural network learning process.
DIS-NNNov 26, 2021
Nonequilibrium Monte Carlo for unfreezing variables in hard combinatorial optimizationMasoud Mohseni, Daniel Eppens, Johan Strumpfer et al.
Optimizing highly complex cost/energy functions over discrete variables is at the heart of many open problems across different scientific disciplines and industries. A major obstacle is the emergence of many-body effects among certain subsets of variables in hard instances leading to critical slowing down or collective freezing for known stochastic local search strategies. An exponential computational effort is generally required to unfreeze such variables and explore other unseen regions of the configuration space. Here, we introduce a quantum-inspired family of nonlocal Nonequilibrium Monte Carlo (NMC) algorithms by developing an adaptive gradient-free strategy that can efficiently learn key instance-wise geometrical features of the cost function. That information is employed on-the-fly to construct spatially inhomogeneous thermal fluctuations for collectively unfreezing variables at various length scales, circumventing costly exploration versus exploitation trade-offs. We apply our algorithm to two of the most challenging combinatorial optimization problems: random k-satisfiability (k-SAT) near the computational phase transitions and Quadratic Assignment Problems (QAP). We observe significant speedup and robustness over both specialized deterministic solvers and generic stochastic solvers. In particular, for 90% of random 4-SAT instances we find solutions that are inaccessible for the best specialized deterministic algorithm known as Survey Propagation (SP) with an order of magnitude improvement in the quality of solutions for the hardest 10% instances. We also demonstrate two orders of magnitude improvement in time-to-solution over the state-of-the-art generic stochastic solver known as Adaptive Parallel Tempering (APT).
AIDec 10, 2020
Learning from Survey Propagation: a Neural Network for MAX-E-$3$-SATRaffaele Marino
Many natural optimization problems are NP-hard, which implies that they are probably hard to solve exactly in the worst-case. However, it suffices to get reasonably good solutions for all (or even most) instances in practice. This paper presents a new algorithm for computing approximate solutions in ${Θ(N})$ for the Maximum Exact 3-Satisfiability (MAX-E-$3$-SAT) problem by using deep learning methodology. This methodology allows us to create a learning algorithm able to fix Boolean variables by using local information obtained by the Survey Propagation algorithm. By performing an accurate analysis, on random CNF instances of the MAX-E-$3$-SAT with several Boolean variables, we show that this new algorithm, avoiding any decimation strategy, can build assignments better than a random one, even if the convergence of the messages is not found. Although this algorithm is not competitive with state-of-the-art Maximum Satisfiability (MAX-SAT) solvers, it can solve substantially larger and more complicated problems than it ever saw during training.
NADec 9, 2020
Solving non-linear Kolmogorov equations in large dimensions by using deep learning: a numerical comparison of discretization schemesNicolas Macris, Raffaele Marino
Non-linear partial differential Kolmogorov equations are successfully used to describe a wide range of time dependent phenomena, in natural sciences, engineering or even finance. For example, in physical systems, the Allen-Cahn equation describes pattern formation associated to phase transitions. In finance, instead, the Black-Scholes equation describes the evolution of the price of derivative investment instruments. Such modern applications often require to solve these equations in high-dimensional regimes in which classical approaches are ineffective. Recently, an interesting new approach based on deep learning has been introduced by E, Han, and Jentzen [1][2]. The main idea is to construct a deep network which is trained from the samples of discrete stochastic differential equations underlying Kolmogorov's equation. The network is able to approximate, numerically at least, the solutions of the Kolmogorov equation with polynomial complexity in whole spatial domains. In this contribution we study variants of the deep networks by using different discretizations schemes of the stochastic differential equation. We compare the performance of the associated networks, on benchmarked examples, and show that, for some discretization schemes, improvements in the accuracy are possible without affecting the observed computational complexity.
CCAug 20, 2015
The backtracking survey propagation algorithm for solving random K-SAT problemsRaffaele Marino, Giorgio Parisi, Federico Ricci-Tersenghi
Discrete combinatorial optimization has a central role in many scientific disciplines, however, for hard problems we lack linear time algorithms that would allow us to solve very large instances. Moreover, it is still unclear what are the key features that make a discrete combinatorial optimization problem hard to solve. Here we study random K-satisfiability problems with $K=3,4$, which are known to be very hard close to the SAT-UNSAT threshold, where problems stop having solutions. We show that the backtracking survey propagation algorithm, in a time practically linear in the problem size, is able to find solutions very close to the threshold, in a region unreachable by any other algorithm. All solutions found have no frozen variables, thus supporting the conjecture that only unfrozen solutions can be found in linear time, and that a problem becomes impossible to solve in linear time when all solutions contain frozen variables.