LGOct 15, 2020

Bi-level Score Matching for Learning Energy-based Latent Variable Models

Fan Bao, Chongxuan Li, Kun Xu, Hang Su, Jun Zhu, Bo Zhang

arXiv:2010.07856v27.915 citationsHas Code

Originality Incremental advance

AI Analysis

This work addresses a significant bottleneck in machine learning for researchers and practitioners by enabling the learning of complex energy-based latent variable models with intractable posteriors, though it is incremental as it builds on existing score matching techniques.

The paper tackles the problem of learning energy-based latent variable models (EBLVMs) with general structures, which was previously largely open, by proposing a bi-level score matching (BiSM) method that reformulates score matching as a bi-level optimization problem. The result shows that BiSM is comparable to existing methods like contrastive divergence and score matching when applicable, and can learn complex EBLVMs to generate natural images, as demonstrated empirically on Gaussian restricted Boltzmann machines and deep convolutional neural networks.

Score matching (SM) provides a compelling approach to learn energy-based models (EBMs) by avoiding the calculation of partition function. However, it remains largely open to learn energy-based latent variable models (EBLVMs), except some special cases. This paper presents a bi-level score matching (BiSM) method to learn EBLVMs with general structures by reformulating SM as a bi-level optimization problem. The higher level introduces a variational posterior of the latent variables and optimizes a modified SM objective, and the lower level optimizes the variational posterior to fit the true posterior. To solve BiSM efficiently, we develop a stochastic optimization algorithm with gradient unrolling. Theoretically, we analyze the consistency of BiSM and the convergence of the stochastic algorithm. Empirically, we show the promise of BiSM in Gaussian restricted Boltzmann machines and highly nonstructural EBLVMs parameterized by deep convolutional neural networks. BiSM is comparable to the widely adopted contrastive divergence and SM methods when they are applicable; and can learn complex EBLVMs with intractable posteriors to generate natural images.

View on arXiv PDF Code

Similar