Reverse Markov Learning: Multi-Step Generative Models for Complex Distributions
This work addresses the problem of modeling complex distributions, such as in image data, for researchers and practitioners in machine learning, though it appears incremental as it builds on existing engression methods.
The authors tackled the challenge of learning complex distributions by proposing reverse Markov learning (RML), a framework that uses multiple engression models in a reverse process to reconstruct target distributions step by step, with empirical results on simulated and climate data demonstrating its effectiveness.
Learning complex distributions is a fundamental challenge in contemporary applications. Shen and Meinshausen (2024) introduced engression, a generative approach based on scoring rules that maps noise (and covariates, if available) directly to data. While effective, engression can struggle with highly complex distributions, such as those encountered in image data. In this work, we propose reverse Markov learning (RML), a framework that defines a general forward process transitioning from the target distribution to a known distribution (e.g., Gaussian) and then learns a reverse Markov process using multiple engression models. This reverse process reconstructs the target distribution step by step. This framework accommodates general forward processes, allows for dimension reduction, and naturally discretizes the generative process. In the special case of diffusion-based forward processes, RML provides an efficient discretization strategy for both training and inference in diffusion models. We further introduce an alternating sampling scheme to enhance post-training performance. Our statistical analysis establishes error bounds for RML and elucidates its advantages in estimation efficiency and flexibility in forward process design. Empirical results on simulated and climate data corroborate the theoretical findings, demonstrating the effectiveness of RML in capturing complex distributions.