Hierarchical Models as Marginals of Hierarchical Models
This work addresses the challenge of efficiently modeling complex distributions in machine learning, offering a theoretical improvement in the representation capacity of restricted Boltzmann machines, though it is incremental in nature.
The paper tackles the problem of representing hierarchical models as marginals of simpler hierarchical models with fewer interactions, focusing on binary variables and pairwise interaction models. It shows that a restricted Boltzmann machine with fewer than $[2(\\log(v)+1)/(v+1)]2^v-1$ hidden binary variables can approximate any distribution of $v$ visible binary variables arbitrarily well, improving upon the previous best bound of $2^{v-1}-1$.
We investigate the representation of hierarchical models in terms of marginals of other hierarchical models with smaller interactions. We focus on binary variables and marginals of pairwise interaction models whose hidden variables are conditionally independent given the visible variables. In this case the problem is equivalent to the representation of linear subspaces of polynomials by feedforward neural networks with soft-plus computational units. We show that every hidden variable can freely model multiple interactions among the visible variables, which allows us to generalize and improve previous results. In particular, we show that a restricted Boltzmann machine with less than $[ 2(\log(v)+1) / (v+1) ] 2^v-1$ hidden binary variables can approximate every distribution of $v$ visible binary variables arbitrarily well, compared to $2^{v-1}-1$ from the best previously known result.