A random energy approach to deep learning
This work provides theoretical insights into deep learning optimization, which could benefit researchers developing more efficient training algorithms, though it appears incremental as it builds on existing random energy models.
The paper tackled the problem of understanding how statistical dependence propagates through deep belief networks, showing that efficient training requires each layer to be tuned near a critical point, leading to broad energy level distributions. This conclusion was confirmed by analyzing Deep Belief Networks and Restricted Boltzmann Machines on various datasets.
We study a generic ensemble of deep belief networks which is parametrized by the distribution of energy levels of the hidden states of each layer. We show that, within a random energy approach, statistical dependence can propagate from the visible to deep layers only if each layer is tuned close to the critical point during learning. As a consequence, efficiently trained learning machines are characterised by a broad distribution of energy levels. The analysis of Deep Belief Networks and Restricted Boltzmann Machines on different datasets confirms these conclusions.