LGFeb 24, 2022

Clarifying MCMC-based training of modern EBMs : Contrastive Divergence versus Maximum Likelihood

arXiv:2202.12176v1
Originality Synthesis-oriented
AI Analysis

This work resolves theoretical misunderstandings in EBM training for researchers, but it is incremental as it clarifies existing methods rather than introducing new ones.

The paper addresses confusion in modern Energy-Based Models (EBMs) by clarifying theoretical errors in recent influential papers regarding Contrastive Divergence (CD) versus Maximum Likelihood training, and provides a reinterpretation with illustrative experiments.

The Energy-Based Model (EBM) framework is a very general approach to generative modeling that tries to learn and exploit probability distributions only defined though unnormalized scores. It has risen in popularity recently thanks to the impressive results obtained in image generation by parameterizing the distribution with Convolutional Neural Networks (CNN). However, the motivation and theoretical foundations behind modern EBMs are often absent from recent papers and this sometimes results in some confusion. In particular, the theoretical justifications behind the popular MCMC-based learning algorithm Contrastive Divergence (CD) are often glossed over and we find that this leads to theoretical errors in recent influential papers (Du & Mordatch, 2019; Du et al., 2020). After offering a first-principles introduction of MCMC-based training, we argue that the learning algorithm they use can in fact not be described as CD and reinterpret theirs methods in light of a new interpretation. Finally, we discuss the implications of our new interpretation and provide some illustrative experiments.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes