A Theoretical Comparison of No-U-Turn Sampler Variants: Necessary and Su?cient Convergence Conditions and Mixing Time Analysis under Gaussian Targets

Samuel Gruffaz, Kyurae Kim, Fares Guehtar, Hadrien Duval-decaix, Pacôme Trautmann

arXiv:2603.1864027.9h-index: 8

AI Analysis

This provides foundational theoretical guarantees for widely used Bayesian sampling algorithms, though it is incremental as it builds on prior work.

The paper addresses the theoretical gap between two No-U-Turn Sampler (NUTS) variants, NUTS-mul and NUTS-BPS, by deriving necessary and sufficient conditions for geometric ergodicity and analyzing mixing times on Gaussian targets. It finds that both variants have similar qualitative behavior with mixing times scaling as O(d^{1/4}) for dimension d, but NUTS-BPS has strictly smaller constants.

The No-U-Turn Sampler (NUTS) is the computational workhorse of modern Bayesian software libraries, yet its qualitative and quantitative convergence guarantees were established only recently. A significant gap remains in the theoretical comparison of its two main variants: NUTS-mul and NUTS-BPS, which use multinomial sampling and biased progressive sampling, respectively, for index selection. In this paper, we address this gap in three contributions. First, we derive the first necessary conditions for geometric ergodicity for both variants. Second, we establish the first sufficient conditions for geometric ergodicity and ergodicity for NUTS-mul. Third, we obtain the first mixing time result for NUTS-BPS on a standard Gaussian distribution. Our results show that NUTS-mul and NUTS-BPS exhibit nearly identical qualitative behavior, with geometric ergodicity depending on the tail properties of the target distribution. However, they differ quantitatively in their convergence rates. More precisely, when initialized in the typical set of the canonical Gaussian measure, the mixing times of both NUTS-mul and NUTS-BPS scale as $O(d^{1/4})$ up to logarithmic factors, where $d$ denotes the dimension. Nevertheless, the associated constants are strictly smaller for NUTS-BPS.

View on arXiv PDF

Similar