ML MEMay 3, 2014

Why (and When and How) Contrastive Divergence Works

arXiv:1405.0602v18 citations

Originality Incremental advance

AI Analysis

This work addresses a theoretical gap for researchers using CD in machine learning and statistics, offering practical guidance for algorithm design, though it is incremental in building on existing CD methods.

The paper tackles the theoretical foundations of contrastive divergence (CD) for inference in high-dimensional distributions with intractable normalizing constants, providing a framework to understand when and how it works, including justifications like variational approximation and applications to social network data using ERGMs.

Contrastive divergence (CD) is a promising method of inference in high dimensional distributions with intractable normalizing constants, however, the theoretical foundations justifying its use are somewhat shaky. This document proposes a framework for understanding CD inference, how/when it works, and provides multiple justifications for the CD moment conditions, including framing them as a variational approximation. Algorithms for performing inference are discussed and are applied to social network data using an exponential-family random graph models (ERGM). The framework also provides guidance about how to construct MCMC kernels providing good CD inference, which turn out to be quite different from those used typically to provide fast global mixing.

View on arXiv PDF

Similar