STDSLGNAMLDec 23, 2024

Shifted Composition III: Local Error Framework for KL Divergence

arXiv:2412.17997v116 citationsh-index: 17
Originality Highly original
AI Analysis

This work provides a unified analysis and improved KL divergence guarantees for sampling algorithms, particularly for randomized midpoint discretization of Langevin diffusion, which is significant for researchers working on stochastic processes and sampling methods.

This paper introduces a local error framework using the shifted composition rule to adapt coupling arguments for bounding the Kullback-Leibler (KL) divergence between two stochastic processes. Applied to sampling from a target distribution using Langevin diffusion and its discretization, it achieves an optimal \tilde O(\sqrt d/\epsilon) rate in strongly log-concave (SLC) and log-Sobolev inequality (LSI) settings.

Coupling arguments are a central tool for bounding the deviation between two stochastic processes, but traditionally have been limited to Wasserstein metrics. In this paper, we apply the shifted composition rule--an information-theoretic principle introduced in our earlier work--in order to adapt coupling arguments to the Kullback-Leibler (KL) divergence. Our framework combine the strengths of two previously disparate approaches: local error analysis and Girsanov's theorem. Akin to the former, it yields tight bounds by incorporating the so-called weak error, and is user-friendly in that it only requires easily verified local assumptions; and akin to the latter, it yields KL divergence guarantees and applies beyond Wasserstein contractivity. We apply this framework to the problem of sampling from a target distribution $π$. Here, the two stochastic processes are the Langevin diffusion and an algorithmic discretization thereof. Our framework provides a unified analysis when $π$ is assumed to be strongly log-concave (SLC), weakly log-concave (WLC), or to satisfy a log-Sobolev inequality (LSI). Among other results, this yields KL guarantees for the randomized midpoint discretization of the Langevin diffusion. Notably, our result: (1) yields the optimal $\tilde O(\sqrt d/ε)$ rate in the SLC and LSI settings; (2) is the first result to hold beyond the 2-Wasserstein metric in the SLC setting; and (3) is the first result to hold in \emph{any} metric in the WLC and LSI settings.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes