LG MLFeb 10, 2025

No Trick, No Treat: Pursuits and Challenges Towards Simulation-free Training of Neural Samplers

Jiajun He, Yuanqi Du, Francisco Vargas, Dinghuai Zhang, Shreyas Padhy, RuiKang OuYang, Carla Gomes, José Miguel Hernández-Lobato

Cambridge

arXiv:2502.06685v227.728 citationsh-index: 7

Originality Incremental advance

AI Analysis

This work addresses the challenge of efficient training of neural samplers, which is crucial for researchers and practitioners in the field of machine learning and artificial intelligence, particularly those working on generative models and sampling problems.

The authors tackled the problem of simulation-free training of neural samplers, proposing a modification to previous methods, but ultimately found it suffered from severe mode collapse, with most methods failing to adequately cover the target without Langevin preconditioning. Their analysis showed that combining Parallel Tempering with a generative model provides a strong baseline.

We consider the sampling problem, where the aim is to draw samples from a distribution whose density is known only up to a normalization constant. Recent breakthroughs in generative modeling to approximate a high-dimensional data distribution have sparked significant interest in developing neural network-based methods for this challenging problem. However, neural samplers typically incur heavy computational overhead due to simulating trajectories during training. This motivates the pursuit of simulation-free training procedures of neural samplers. In this work, we propose an elegant modification to previous methods, which allows simulation-free training with the help of a time-dependent normalizing flow. However, it ultimately suffers from severe mode collapse. On closer inspection, we find that nearly all successful neural samplers rely on Langevin preconditioning to avoid mode collapsing. We systematically analyze several popular methods with various objective functions and demonstrate that, in the absence of Langevin preconditioning, most of them fail to adequately cover even a simple target. Finally, we draw attention to a strong baseline by combining the state-of-the-art MCMC method, Parallel Tempering (PT), with an additional generative model to shed light on future explorations of neural samplers.

View on arXiv PDF

Similar