DSApr 6, 2022
Optimal Sublinear Sampling of Spanning Trees and Determinantal Point Processes via Average-Case Entropic IndependenceNima Anari, Yang P. Liu, Thuy-Duong Vuong
We design fast algorithms for repeatedly sampling from strongly Rayleigh distributions, which include random spanning tree distributions and determinantal point processes. For a graph $G=(V, E)$, we show how to approximately sample uniformly random spanning trees from $G$ in $\widetilde{O}(\lvert V\rvert)$ time per sample after an initial $\widetilde{O}(\lvert E\rvert)$ time preprocessing. For a determinantal point process on subsets of size $k$ of a ground set of $n$ elements, we show how to approximately sample in $\widetilde{O}(k^ω)$ time after an initial $\widetilde{O}(nk^{ω-1})$ time preprocessing, where $ω<2.372864$ is the matrix multiplication exponent. We even improve the state of the art for obtaining a single sample from determinantal point processes, from the prior runtime of $\widetilde{O}(\min\{nk^2, n^ω\})$ to $\widetilde{O}(nk^{ω-1})$. In our main technical result, we achieve the optimal limit on domain sparsification for strongly Rayleigh distributions. In domain sparsification, sampling from a distribution $μ$ on $\binom{[n]}{k}$ is reduced to sampling from related distributions on $\binom{[t]}{k}$ for $t\ll n$. We show that for strongly Rayleigh distributions, we can can achieve the optimal $t=\widetilde{O}(k)$. Our reduction involves sampling from $\widetilde{O}(1)$ domain-sparsified distributions, all of which can be produced efficiently assuming convenient access to approximate overestimates for marginals of $μ$. Having access to marginals is analogous to having access to the mean and covariance of a continuous distribution, or knowing "isotropy" for the distribution, the key assumption behind the Kannan-Lovász-Simonovits (KLS) conjecture and optimal samplers based on it. We view our result as a moral analog of the KLS conjecture and its consequences for sampling, for discrete strongly Rayleigh measures.
LGOct 3, 2023
Sampling Multimodal Distributions with the Vanilla Score: Benefits of Data-Based InitializationFrederic Koehler, Thuy-Duong Vuong
There is a long history, as well as a recent explosion of interest, in statistical and generative modeling approaches based on score functions -- derivatives of the log-likelihood of a distribution. In seminal works, Hyvärinen proposed vanilla score matching as a way to learn distributions from data by computing an estimate of the score function of the underlying ground truth, and established connections between this method and established techniques like Contrastive Divergence and Pseudolikelihood estimation. It is by now well-known that vanilla score matching has significant difficulties learning multimodal distributions. Although there are various ways to overcome this difficulty, the following question has remained unanswered -- is there a natural way to sample multimodal distributions using just the vanilla score? Inspired by a long line of related experimental works, we prove that the Langevin diffusion with early stopping, initialized at the empirical distribution, and run on a score function estimated from data successfully generates natural multimodal distributions (mixtures of log-concave distributions).
84.8LGMay 23
A computational phase transition for learning-to-sample from Ising modelsAndrej Risteski, Thuy-Duong Vuong
We study \emph{learning-to-sample} -- a basic algorithmic task underlying generative modeling -- for Ising models, a standard testbed for algorithmic ideas in both theoretical computer science and machine learning. Given i.i.d. samples of an unknown target distribution, the goal of learning-to-sample is to learn a computationally efficient generation procedure that produces new samples following approximately the same distribution. We construct a family of Ising models of constantly bounded-width which lie just beyond the spectral threshold $λ_{\max}(J)-λ_{\min}(J)=1$, and show that learning-to-sample for this family is computationally hard under standard cryptographic assumptions, even when the learner is given both polynomially many i.i.d. samples from the model and explicit access to its parameters. Combined with results of [AJKPV24,KLV25] showing tractability of learning-to-sample below the spectral threshold, this establishes a sharp computational phase transition at the spectral threshold. Moreover, combined with prior results on parameter learning for bounded-width Ising models [KM17,WSD19,VML20], this shows that learning-to-sample can be more difficult than parameter learning. Finally, we show that any efficient learner for these hard instances exhibits a natural memorization-hallucination dichotomy: the learner must either output configurations that, after a simple transformation, match the (transformed) training data or place substantial mass on configurations of negligible probability under the target distribution.
DSNov 11, 2025
Parallel Sampling via AutospeculationNima Anari, Carlo Baronio, CJ Chen et al.
We present parallel algorithms to accelerate sampling via counting in two settings: any-order autoregressive models and denoising diffusion models. An any-order autoregressive model accesses a target distribution $μ$ on $[q]^n$ through an oracle that provides conditional marginals, while a denoising diffusion model accesses a target distribution $μ$ on $\mathbb{R}^n$ through an oracle that provides conditional means under Gaussian noise. Standard sequential sampling algorithms require $\widetilde{O}(n)$ time to produce a sample from $μ$ in either setting. We show that, by issuing oracle calls in parallel, the expected sampling time can be reduced to $\widetilde{O}(n^{1/2})$. This improves the previous $\widetilde{O}(n^{2/3})$ bound for any-order autoregressive models and yields the first parallel speedup for diffusion models in the high-accuracy regime, under the relatively mild assumption that the support of $μ$ is bounded. We introduce a novel technique to obtain our results: speculative rejection sampling. This technique leverages an auxiliary ``speculative'' distribution~$ν$ that approximates~$μ$ to accelerate sampling. Our technique is inspired by the well-studied ``speculative decoding'' techniques popular in large language models, but differs in key ways. Firstly, we use ``autospeculation,'' namely we build the speculation $ν$ out of the same oracle that defines~$μ$. In contrast, speculative decoding typically requires a separate, faster, but potentially less accurate ``draft'' model $ν$. Secondly, the key differentiating factor in our technique is that we make and accept speculations at a ``sequence'' level rather than at the level of single (or a few) steps. This last fact is key to unlocking our parallel runtime of $\widetilde{O}(n^{1/2})$.
91.3ITApr 13
Entropic independence via sparse localizationVishesh Jain, Huy Tuan Pham, Thuy-Duong Vuong
Entropic independence is a structural property of measures that underlies modern proofs of functional inequalities, notably (modified) log-Sobolev inequalities, via ``annealing'' or local-to-global schemes. Existing sufficient criteria for entropic independence typically require spectral independence and/or uniform bounds on marginals under \emph{all} pinnings, which can fail in natural canonical-ensemble models even when strong mixing properties are expected. We introduce \emph{sparse localization}: a restricted localization framework, in the spirit of Chen--Eldan, in which one assumes $\ell_2$-independence only for a sparse family of pinnings (those fixing at most $cn$ coordinates for any $c > 0$), yet still deduces quadratic entropic stability and entropic independence with an explicit multiplicative loss of order $c^{-1}$. As an application, we give a rigorous proof of approximate conservation of entropy for the uniform distribution on independent sets of a given size in bounded degree graphs.
LGDec 21, 2023
Fairness in Submodular Maximization over a Matroid ConstraintMarwa El Halabi, Jakub Tarnawski, Ashkan Norouzi-Fard et al.
Submodular maximization over a matroid constraint is a fundamental problem with various applications in machine learning. Some of these applications involve decision-making over datapoints with sensitive attributes such as gender or race. In such settings, it is crucial to guarantee that the selected solution is fairly distributed with respect to this attribute. Recently, fairness has been investigated in submodular maximization under a cardinality constraint in both the streaming and offline settings, however the more general problem with matroid constraint has only been considered in the streaming setting and only for monotone objectives. This work fills this gap. We propose various algorithms and impossibility results offering different trade-offs between quality, fairness, and generality.
LGNov 14, 2024
Efficiently learning and sampling multimodal distributions with data-based initializationFrederic Koehler, Holden Lee, Thuy-Duong Vuong
We consider the problem of sampling a multimodal distribution with a Markov chain given a small number of samples from the stationary measure. Although mixing can be arbitrarily slow, we show that if the Markov chain has a $k$th order spectral gap, initialization from a set of $\tilde O(k/\varepsilon^2)$ samples from the stationary distribution will, with high probability over the samples, efficiently generate a sample whose conditional law is $\varepsilon$-close in TV distance to the stationary measure. In particular, this applies to mixtures of $k$ distributions satisfying a Poincaré inequality, with faster convergence when they satisfy a log-Sobolev inequality. Our bounds are stable to perturbations to the Markov chain, and in particular work for Langevin diffusion over $\mathbb R^d$ with score estimation error, as well as Glauber dynamics combined with approximation error from pseudolikelihood estimation. This justifies the success of data-based initialization for score matching methods despite slow mixing for the data distribution, and improves and generalizes the results of Koehler and Vuong (2023) to have linear, rather than exponential, dependence on $k$ and apply to arbitrary semigroups. As a consequence of our results, we show for the first time that a natural class of low-complexity Ising measures can be efficiently learned from samples.
LGFeb 10, 2021
From Sampling to Optimization on Discrete Domains with Applications to Determinant MaximizationNima Anari, Thuy-Duong Vuong
We show a connection between sampling and optimization on discrete domains. For a family of distributions $μ$ defined on size $k$ subsets of a ground set of elements that is closed under external fields, we show that rapid mixing of natural local random walks implies the existence of simple approximation algorithms to find $\max μ(\cdot)$. More precisely we show that if (multi-step) down-up random walks have spectral gap at least inverse polynomially large in $k$, then (multi-step) local search can find $\max μ(\cdot)$ within a factor of $k^{O(k)}$. As the main application of our result, we show a simple nearly-optimal $k^{O(k)}$-factor approximation algorithm for MAP inference on nonsymmetric DPPs. This is the first nontrivial multiplicative approximation for finding the largest size $k$ principal minor of a square (not-necessarily-symmetric) matrix $L$ with $L+L^\intercal\succeq 0$. We establish the connection between sampling and optimization by showing that an exchange inequality, a concept rooted in discrete convex analysis, can be derived from fast mixing of local random walks. We further connect exchange inequalities with composable core-sets for optimization, generalizing recent results on composable core-sets for DPP maximization to arbitrary distributions that satisfy either the strongly Rayleigh property or that have a log-concave generating polynomial.