Activation Functions, Statistics and Learning of Higher-Order Interactions in Restricted Boltzmann Machines

arXiv:2605.1917831.8
AI Analysis

For researchers studying representation learning in RBMs, this paper provides analytical insights into how activation functions affect the ability to capture higher-order interactions, though the findings are domain-specific to RBMs.

This work characterizes the statistics of interactions induced by Restricted Boltzmann Machines (RBMs) with different activation functions, showing that data with large higher-order interactions are difficult for any RBM to learn, but rapidly increasing nonlinearities like Exponential can facilitate learning for a specific parameter range.

The great success of neural networks in recognizing hidden patterns and correlations in complex data lies in the way they take advantage of the large number of parameters and nonlinear single-unit activation, jointly. Restricted Boltzmann Machines (RBMs) provide a simple yet powerful framework for studying the impact of activation nonlinearities on performance and representation. In this work, we exploit the duality between RBMs and models of interacting binary variables to study the statistics of the interactions induced by RBM ensembles with different hidden unit activation functions. We characterize the space of representable models analytically in terms of moments of the distribution of induced interactions for four commonly used activation functions: Linear, Step, ReLU, and Exponential. Quantitative predictions of the analytical calculations on learning show a very good agreement with results of the simulations of the training process. In particular, our analysis shows that there are certain data structures, namely those generated by models of interacting variables with large interaction terms beyond pairwise, that are difficult to represent, and thus to learn, for any RBM. Yet, we find that rapidly increasing nonlinearities, such as the Exponential function, can facilitate the representation and learning of such data structures for a specific range of parameters that is determined analytically.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes