Boltzmann machines and energy-based models
This is an incremental review paper for researchers in machine learning, summarizing existing knowledge without introducing new methods or results.
The paper reviews Boltzmann machines and energy-based models, noting that while their mathematical representations are elegant, computing gradients and Hessians is generally intractable, motivating approximate methods like Gibbs sampling and contrastive divergence.
We review Boltzmann machines and energy-based models. A Boltzmann machine defines a probability distribution over binary-valued patterns. One can learn parameters of a Boltzmann machine via gradient based approaches in a way that log likelihood of data is increased. The gradient and Hessian of a Boltzmann machine admit beautiful mathematical representations, although computing them is in general intractable. This intractability motivates approximate methods, including Gibbs sampler and contrastive divergence, and tractable alternatives, namely energy-based models.