DIS-NN STAT-MECH LGJun 14, 2024

Fundamental operating regimes, hyper-parameter fine-tuning and glassiness: towards an interpretable replica-theory for trained restricted Boltzmann machines

Alberto Fachechi, Elena Agliari, Miriam Aquaro, Anthony Coolen, Menno Mulder

arXiv:2406.09924v14.35 citations

Originality Incremental advance

AI Analysis

This work addresses interpretability and hyperparameter tuning for restricted Boltzmann machines, which is incremental as it builds on existing statistical mechanics approaches.

The paper tackled the problem of understanding the generative capabilities of restricted Boltzmann machines trained on noisy data by developing a statistical mechanics framework using the replica trick. It identified key hyperparameters that control different operational regimes and provided evidence for replica-symmetry breaking in certain hyperparameter regions.

We consider restricted Boltzmann machines with a binary visible layer and a Gaussian hidden layer trained by an unlabelled dataset composed of noisy realizations of a single ground pattern. We develop a statistical mechanics framework to describe the network generative capabilities, by exploiting the replica trick and assuming self-averaging of the underlying order parameters (i.e., replica symmetry). In particular, we outline the effective control parameters (e.g., the relative number of weights to be trained, the regularization parameter), whose tuning can yield qualitatively-different operative regimes. Further, we provide analytical and numerical evidence for the existence of a sub-region in the space of the hyperparameters where replica-symmetry breaking occurs.

View on arXiv PDF

Similar