The emergence of a concept in shallow neural networks

arXiv:2109.00454v153 citations
Originality Incremental advance
AI Analysis

This addresses a foundational open problem in machine learning by providing analytical insights into sample complexity for learning in shallow networks, though it is incremental as it builds on existing theories.

The paper tackles the problem of determining a critical sample size for shallow neural networks to learn archetypes from blurred data, showing that beyond this threshold, restricted Boltzmann machines can effectively function as generative models or classifiers.

We consider restricted Boltzmann machine (RBMs) trained over an unstructured dataset made of blurred copies of definite but unavailable ``archetypes'' and we show that there exists a critical sample size beyond which the RBM can learn archetypes, namely the machine can successfully play as a generative model or as a classifier, according to the operational routine. In general, assessing a critical sample size (possibly in relation to the quality of the dataset) is still an open problem in machine learning. Here, restricting to the random theory, where shallow networks suffice and the grand-mother cell scenario is correct, we leverage the formal equivalence between RBMs and Hopfield networks, to obtain a phase diagram for both the neural architectures which highlights regions, in the space of the control parameters (i.e., number of archetypes, number of neurons, size and quality of the training set), where learning can be accomplished. Our investigations are led by analytical methods based on the statistical-mechanics of disordered systems and results are further corroborated by extensive Monte Carlo simulations.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes