Jarzynski Reweighting and Sampling Dynamics for Training Energy-Based Models: Theoretical Analysis of Different Transition Kernels
This work provides theoretical insights for researchers in generative modeling, but it is incremental as it builds on existing techniques without demonstrating broad SOTA gains.
The paper tackles the challenge of training Energy-Based Models (EBMs) by theoretically analyzing Jarzynski reweighting to address biases from traditional methods like contrastive divergence and score matching, showing its application in flow-based diffusion models to improve sample quality and in Restricted Boltzmann Machines to correct biases.
Energy-Based Models (EBMs) provide a flexible framework for generative modeling, but their training remains theoretically challenging due to the need to approximate normalization constants and efficiently sample from complex, multi-modal distributions. Traditional methods, such as contrastive divergence and score matching, introduce biases that can hinder accurate learning. In this work, we present a theoretical analysis of Jarzynski reweighting, a technique from non-equilibrium statistical mechanics, and its implications for training EBMs. We focus on the role of the choice of the kernel and we illustrate these theoretical considerations in two key generative frameworks: (i) flow-based diffusion models, where we reinterpret Jarzynski reweighting in the context of stochastic interpolants to mitigate discretization errors and improve sample quality, and (ii) Restricted Boltzmann Machines, where we analyze its role in correcting the biases of contrastive divergence. Our results provide insights into the interplay between kernel choice and model performance, highlighting the potential of Jarzynski reweighting as a principled tool for generative learning.