Hajime Yoshino

DIS-NN
3papers
13citations
Novelty65%
AI Score38

3 Papers

DIS-NNFeb 15, 2023
Spatially heterogeneous learning by a deep student machine

Hajime Yoshino

Deep neural networks (DNN) with a huge number of adjustable parameters remain largely black boxes. To shed light on the hidden layers of DNN, we study supervised learning by a DNN of width $N$ and depth $L$ consisting of $NL$ perceptrons with $c$ inputs by a statistical mechanics approach called the teacher-student setting. We consider an ensemble of student machines that exactly reproduce $M$ sets of $N$ dimensional input/output relations provided by a teacher machine. We show that the problem becomes exactly solvable in what we call as 'dense limit': $N \gg c \gg 1$ and $M \gg 1$ with fixed $α=M/c$ using the replica method developed in (H. Yoshino, (2020)). We also study the model numerically performing simple greedy MC simulations. Simulations reveal that learning by the DNN is quite heterogeneous in the network space: configurations of the teacher and the student machines are more correlated within the layers closer to the input/output boundaries while the central region remains much less correlated due to the over-parametrization in qualitative agreement with the theoretical prediction. We evaluate the generalization-error of the DNN with various depth $L$ both theoretically and numerically. Remarkably both the theory and simulation suggest generalization-ability of the student machines, which are only weakly correlated with the teacher in the center, does not vanish even in the deep limit $L \gg 1$ where the system becomes heavily over-parametrized. We also consider the impact of effective dimension $D(\leq N)$ of data by incorporating the hidden manifold model (S. Goldt et. al., (2020)) into our model. The theory implies that the loop corrections to the dense limit become enhanced by either decreasing the width $N$ or decreasing the effective dimension $D$ of the data. Simulation suggests both lead to significant improvements in generalization-ability.

MLOct 18, 2025
Graphical model for tensor factorization by sparse sampling

Angelo Giorgio Cavaliere, Riki Nagasawa, Shuta Yokoi et al.

We consider tensor factorizations based on sparse measurements of the tensor components. The measurements are designed in a way that the underlying graph of interactions is a random graph. The setup will be useful in cases where a substantial amount of data is missing, as in recommendation systems heavily used in social network services. In order to obtain theoretical insights on the setup, we consider statistical inference of the tensor factorization in a high dimensional limit, which we call as dense limit, where the graphs are large and dense but not fully connected. We build message-passing algorithms and test them in a Bayes optimal teacher-student setting. We also develop a replica theory, which becomes exact in the dense limit,to examine the performance of statistical inference.

DIS-NNOct 22, 2019
From complex to simple : hierarchical free-energy landscape renormalized in deep neural networks

Hajime Yoshino

We develop a statistical mechanical approach based on the replica method to study the design space of deep and wide neural networks constrained to meet a large number of training data. Specifically, we analyze the configuration space of the synaptic weights and neurons in the hidden layers in a simple feed-forward perceptron network for two scenarios: a setting with random inputs/outputs and a teacher-student setting. By increasing the strength of constraints,~i.e. increasing the number of training data, successive 2nd order glass transition (random inputs/outputs) or 2nd order crystalline transition (teacher-student setting) take place layer-by-layer starting next to the inputs/outputs boundaries going deeper into the bulk with the thickness of the solid phase growing logarithmically with the data size. This implies the typical storage capacity of the network grows exponentially fast with the depth. In a deep enough network, the central part remains in the liquid phase. We argue that in systems of finite width N, the weak bias field can remain in the center and plays the role of a symmetry-breaking field that connects the opposite sides of the system. The successive glass transitions bring about a hierarchical free-energy landscape with ultrametricity, which evolves in space: it is most complex close to the boundaries but becomes renormalized into progressively simpler ones in deeper layers. These observations provide clues to understand why deep neural networks operate efficiently. Finally, we present some numerical simulations of learning which reveal spatially heterogeneous glassy dynamics truncated by a finite width $N$ effect.