LGApr 25, 2023

Contrastive Energy Prediction for Exact Energy-Guided Diffusion Sampling in Offline Reinforcement Learning

arXiv:2304.12824v2150 citationsh-index: 51
Originality Highly original
AI Analysis

This addresses a general problem in applying diffusion models to real-world tasks like offline RL, offering an exact solution for energy-guided sampling, though it is incremental in improving guidance estimation.

The paper tackles the challenge of unknown intermediate guidance in energy-guided diffusion sampling by proposing an exact formulation and a novel training objective called contrastive energy prediction (CEP), which is guaranteed to converge to exact guidance and outperforms state-of-the-art algorithms on D4RL benchmarks in offline reinforcement learning.

Guided sampling is a vital approach for applying diffusion models in real-world tasks that embeds human-defined guidance during the sampling procedure. This paper considers a general setting where the guidance is defined by an (unnormalized) energy function. The main challenge for this setting is that the intermediate guidance during the diffusion sampling procedure, which is jointly defined by the sampling distribution and the energy function, is unknown and is hard to estimate. To address this challenge, we propose an exact formulation of the intermediate guidance as well as a novel training objective named contrastive energy prediction (CEP) to learn the exact guidance. Our method is guaranteed to converge to the exact guidance under unlimited model capacity and data samples, while previous methods can not. We demonstrate the effectiveness of our method by applying it to offline reinforcement learning (RL). Extensive experiments on D4RL benchmarks demonstrate that our method outperforms existing state-of-the-art algorithms. We also provide some examples of applying CEP for image synthesis to demonstrate the scalability of CEP on high-dimensional data.

Code Implementations3 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes