MLLGMay 20, 2016

Convergence of Contrastive Divergence with Annealed Learning Rate in Exponential Family

arXiv:1605.06220v1
Originality Incremental advance
AI Analysis

This provides theoretical guarantees for a variant of contrastive divergence, which is incremental for practitioners in machine learning and statistics.

The paper tackles the convergence of contrastive divergence with annealed learning rate in exponential families, establishing that it yields asymptotically consistent estimates with a convergence rate of 1/∛n, as demonstrated in experiments on a Boltzmann Machine.

In our recent paper, we showed that in exponential family, contrastive divergence (CD) with fixed learning rate will give asymptotically consistent estimates \cite{wu2016convergence}. In this paper, we establish consistency and convergence rate of CD with annealed learning rate $η_t$. Specifically, suppose CD-$m$ generates the sequence of parameters $\{θ_t\}_{t \ge 0}$ using an i.i.d. data sample $\mathbf{X}_1^n \sim p_{θ^*}$ of size $n$, then $δ_n(\mathbf{X}_1^n) = \limsup_{t \to \infty} \Vert \sum_{s=t_0}^t η_s θ_s / \sum_{s=t_0}^t η_s - θ^* \Vert$ converges in probability to 0 at a rate of $1/\sqrt[3]{n}$. The number ($m$) of MCMC transitions in CD only affects the coefficient factor of convergence rate. Our proof is not a simple extension of the one in \cite{wu2016convergence}. which depends critically on the fact that $\{θ_t\}_{t \ge 0}$ is a homogeneous Markov chain conditional on the observed sample $\mathbf{X}_1^n$. Under annealed learning rate, the homogeneous Markov property is not available and we have to develop an alternative approach based on super-martingales. Experiment results of CD on a fully-visible $2\times 2$ Boltzmann Machine are provided to demonstrate our theoretical results.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes