Nima Anari

DS
h-index66
17papers
489citations
Novelty63%
AI Score59

17 Papers

DSMay 31
Constant-Stretch Rounding on the Hypersimplex

Nima Anari, Alireza Haqi, Eric Ma

We study correlated rounding on the hypersimplex, the base polytope of the uniform matroid. For each point $x$ in the hypersimplex, the goal is to sample a $k$-subset $A(x)$ with marginals $x$, while coupling the samples for all choices of $x$ so that nearby inputs produce nearby sets. We give a constant-stretch scheme. Our scheme samples the maximum-entropy $k$-subset distribution with prescribed marginals using a common random ordering and common uniform thresholds. For every $x,y\in[0,1]^n$ with $\sum_i x_i=\sum_i y_i=k$, it satisfies $\mathbb{E}[|A(x)\triangle A(y)|]\le 6\|x-y\|_1$. This improves the previous $O(\log k)$ bound for hypersimplex correlated rounding and answers an open question raised by Naor, Raju, Shetty, Srinivasan, Valieva, and Wajc. By adding dummy coordinates, the same result gives stretch at most $12$ for the at-most-$k$ polytope. The proof was found in a GPT 5.5 Pro Extended conversation prompted by the authors, and Codex was used to help assemble the manuscript under the authors' supervision.

DSMay 28
Sampling Directed Eulerian Tours in $\widetilde O(m^{3/2})$ Time

Nima Anari

We give a randomized algorithm that samples a nearly uniform Eulerian tour of a directed Eulerian multigraph with $m$ arcs in $\widetilde O(m^{3/2})$ time. The guarantee is worst-case, applies to arbitrary directed Eulerian multigraphs, and breaks the $mn$-type arborescence-sampling barrier on sparse graphs. The core case is a $2$-in/$2$-out graph. We introduce a new local Markov chain, the flip--repair walk: one step locally splits a tour into two circuits and then chooses uniformly among the local flips that repair the state to one tour. We prove that this walk mixes in nearly linear many steps and implement the walk using a dynamic chord data structure. A pointwise degree-reduction wrapper extends the sampler from this degree-two core to arbitrary degrees while preserving the $\widetilde O(m^{3/2})$ total running time. The high-level algorithmic plan, the switching-network reduction, and the dynamic data-structure argument were devised by the author. The author conjectured the mixing theorem underlying the analysis, and GPT 5.5 Pro Extended produced its linear-algebra proof. Codex assisted with manuscript assembly and typesetting.

DSApr 6, 2022
Optimal Sublinear Sampling of Spanning Trees and Determinantal Point Processes via Average-Case Entropic Independence

Nima Anari, Yang P. Liu, Thuy-Duong Vuong

We design fast algorithms for repeatedly sampling from strongly Rayleigh distributions, which include random spanning tree distributions and determinantal point processes. For a graph $G=(V, E)$, we show how to approximately sample uniformly random spanning trees from $G$ in $\widetilde{O}(\lvert V\rvert)$ time per sample after an initial $\widetilde{O}(\lvert E\rvert)$ time preprocessing. For a determinantal point process on subsets of size $k$ of a ground set of $n$ elements, we show how to approximately sample in $\widetilde{O}(k^ω)$ time after an initial $\widetilde{O}(nk^{ω-1})$ time preprocessing, where $ω<2.372864$ is the matrix multiplication exponent. We even improve the state of the art for obtaining a single sample from determinantal point processes, from the prior runtime of $\widetilde{O}(\min\{nk^2, n^ω\})$ to $\widetilde{O}(nk^{ω-1})$. In our main technical result, we achieve the optimal limit on domain sparsification for strongly Rayleigh distributions. In domain sparsification, sampling from a distribution $μ$ on $\binom{[n]}{k}$ is reduced to sampling from related distributions on $\binom{[t]}{k}$ for $t\ll n$. We show that for strongly Rayleigh distributions, we can can achieve the optimal $t=\widetilde{O}(k)$. Our reduction involves sampling from $\widetilde{O}(1)$ domain-sparsified distributions, all of which can be produced efficiently assuming convenient access to approximate overestimates for marginals of $μ$. Having access to marginals is analogous to having access to the mean and covariance of a continuous distribution, or knowing "isotropy" for the distribution, the key assumption behind the Kannan-Lovász-Simonovits (KLS) conjecture and optimal samplers based on it. We view our result as a moral analog of the KLS conjecture and its consequences for sampling, for discrete strongly Rayleigh measures.

DSAug 18, 2024
Parallel Sampling via Counting

Nima Anari, Ruiquan Gao, Aviad Rubinstein

We show how to use parallelization to speed up sampling from an arbitrary distribution $μ$ on a product space $[q]^n$, given oracle access to counting queries: $\mathbb{P}_{X\sim μ}[X_S=σ_S]$ for any $S\subseteq [n]$ and $σ_S \in [q]^S$. Our algorithm takes $O({n^{2/3}\cdot \operatorname{polylog}(n,q)})$ parallel time, to the best of our knowledge, the first sublinear in $n$ runtime for arbitrary distributions. Our results have implications for sampling in autoregressive models. Our algorithm directly works with an equivalent oracle that answers conditional marginal queries $\mathbb{P}_{X\sim μ}[X_i=σ_i\;\vert\; X_S=σ_S]$, whose role is played by a trained neural network in autoregressive models. This suggests a roughly $n^{1/3}$-factor speedup is possible for sampling in any-order autoregressive models. We complement our positive result by showing a lower bound of $\widetildeΩ(n^{1/3})$ for the runtime of any parallel sampling algorithm making at most $\operatorname{poly}(n)$ queries to the counting oracle, even for $q=2$.

DSMar 26
Fast Spanning Tree Sampling in Broadcast Congested Clique

Nima Anari, Alireza Haqi

We present the first polylogarithmic-round algorithm for sampling a random spanning tree in the (Broadcast) Congested Clique model. For any constant $c > 0$, our algorithm outputs a sample from a distribution whose total variation distance from the uniform spanning tree distribution is at most $O(n^{-c})$ in at most $c \cdot \log^{O(1)}(n)$ rounds. The exponent hidden in $\log^{O(1)}(n)$ is an absolute constant independent of $c$ and $n$. This is an exponential improvement over the previous best algorithm of Pemmaraju, Roy, and Sobel (PODC 2025) for the Congested Clique model.

DSNov 11, 2025
Parallel Sampling via Autospeculation

Nima Anari, Carlo Baronio, CJ Chen et al.

We present parallel algorithms to accelerate sampling via counting in two settings: any-order autoregressive models and denoising diffusion models. An any-order autoregressive model accesses a target distribution $μ$ on $[q]^n$ through an oracle that provides conditional marginals, while a denoising diffusion model accesses a target distribution $μ$ on $\mathbb{R}^n$ through an oracle that provides conditional means under Gaussian noise. Standard sequential sampling algorithms require $\widetilde{O}(n)$ time to produce a sample from $μ$ in either setting. We show that, by issuing oracle calls in parallel, the expected sampling time can be reduced to $\widetilde{O}(n^{1/2})$. This improves the previous $\widetilde{O}(n^{2/3})$ bound for any-order autoregressive models and yields the first parallel speedup for diffusion models in the high-accuracy regime, under the relatively mild assumption that the support of $μ$ is bounded. We introduce a novel technique to obtain our results: speculative rejection sampling. This technique leverages an auxiliary ``speculative'' distribution~$ν$ that approximates~$μ$ to accelerate sampling. Our technique is inspired by the well-studied ``speculative decoding'' techniques popular in large language models, but differs in key ways. Firstly, we use ``autospeculation,'' namely we build the speculation $ν$ out of the same oracle that defines~$μ$. In contrast, speculative decoding typically requires a separate, faster, but potentially less accurate ``draft'' model $ν$. Secondly, the key differentiating factor in our technique is that we make and accept speculations at a ``sequence'' level rather than at the level of single (or a few) steps. This last fact is key to unlocking our parallel runtime of $\widetilde{O}(n^{1/2})$.

DSMay 21
Optimal $e^{(γ+o(1))n}$-Approximation of the Permanent of Positive Semidefinite Matrices

Nima Anari, Farzam Ebrahimnejad

We determine, up to lower-order terms in the exponent, the best possible deterministic polynomial-time approximation ratio for the permanent of a Hermitian positive semidefinite matrix. If $A\succeq 0$ has no zero diagonal entry, $d=\operatorname{rank}(A)$, $A=VV^\dagger$ with $V\in\mathbb{C}^{n\times d}$ full column rank, and $v_1,\ldots,v_n$ are the rows of $V$, define \[ Φ(V)=\max_{X\succ 0} \left\{\sum_{i=1}^n \log(v_i^\dagger Xv_i)+\log\det X-\operatorname{tr} X+d\right\}, \qquad \widehat P(A)=e^{Φ(V)}. \] We prove the exact sandwich \[ e^{-γn}\widehat P(A)\le \operatorname{per}(A)\le \widehat P(A). \] Here $γ$ is the Euler--Mascheroni constant. Since the maximization is concave, this gives a deterministic polynomial-time $e^{(γ+\varepsilon)n}$-approximation for every $\varepsilon>0$. Combined with the previous $e^{(γ-\varepsilon)n}$-hardness of approximation for positive semidefinite permanents, this resolves the optimal exponential approximation ratio for deterministic polynomial-time algorithms as $e^{(γ+o(1))n}$, assuming $\mathrm{P}\ne\mathrm{NP}$. The proof is an entropy argument applied to the standard Wick integral formula for $\operatorname{per}(A)$; the loss is exactly $γ$ per factor because $\mathbb{E}[\log T]=-γ$ for $T\sim\operatorname{Exp}(1)$. The result was obtained through interactions with GPT 5.5 Pro Extended: the first author's interaction was one-shot, while the second author's was a separate multi-turn interaction with high-level guidance. Both authors verified the theorem and proof. Codex was used to assemble and typeset the manuscript.

LGFeb 24, 2024
Batch Active Learning of Reward Functions from Human Preferences

Erdem Bıyık, Nima Anari, Dorsa Sadigh

Data generation and labeling are often expensive in robot learning. Preference-based learning is a concept that enables reliable labeling by querying users with preference questions. Active querying methods are commonly employed in preference-based learning to generate more informative data at the expense of parallelization and computation time. In this paper, we develop a set of novel algorithms, batch active preference-based learning methods, that enable efficient learning of reward functions using as few data samples as possible while still having short query generation times and also retaining parallelizability. We introduce a method based on determinantal point processes (DPP) for active batch generation and several heuristic-based alternatives. Finally, we present our experimental results for a variety of robotics tasks in simulation. Our results suggest that our batch active learning algorithm requires only a few queries that are computed in a short amount of time. We showcase one of our algorithms in a study to learn human users' preferences.

LGMay 6, 2025
Diffusion Models are Secretly Exchangeable: Parallelizing DDPMs via Autospeculation

Hengyuan Hu, Aniket Das, Dorsa Sadigh et al.

Denoising Diffusion Probabilistic Models (DDPMs) have emerged as powerful tools for generative modeling. However, their sequential computation requirements lead to significant inference-time bottlenecks. In this work, we utilize the connection between DDPMs and Stochastic Localization to prove that, under an appropriate reparametrization, the increments of DDPM satisfy an exchangeability property. This general insight enables near-black-box adaptation of various performance optimization techniques from autoregressive models to the diffusion setting. To demonstrate this, we introduce \emph{Autospeculative Decoding} (ASD), an extension of the widely used speculative decoding algorithm to DDPMs that does not require any auxiliary draft models. Our theoretical analysis shows that ASD achieves a $\tilde{O} (K^{\frac{1}{3}})$ parallel runtime speedup over the $K$ step sequential DDPM. We also demonstrate that a practical implementation of autospeculative decoding accelerates DDPM inference significantly in various domains.

LGMay 25, 2023
Parallel Sampling of Diffusion Models

Andy Shih, Suneel Belkhale, Stefano Ermon et al.

Diffusion models are powerful generative models but suffer from slow sampling, often taking 1000 sequential denoising steps for one sample. As a result, considerable efforts have been directed toward reducing the number of denoising steps, but these methods hurt sample quality. Instead of reducing the number of denoising steps (trading quality for speed), in this paper we explore an orthogonal approach: can we run the denoising steps in parallel (trading compute for speed)? In spite of the sequential nature of the denoising steps, we show that surprisingly it is possible to parallelize sampling via Picard iterations, by guessing the solution of future denoising steps and iteratively refining until convergence. With this insight, we present ParaDiGMS, a novel method to accelerate the sampling of pretrained diffusion models by denoising multiple steps in parallel. ParaDiGMS is the first diffusion sampling method that enables trading compute for speed and is even compatible with existing fast sampling techniques such as DDIM and DPMSolver. Using ParaDiGMS, we improve sampling speed by 2-4x across a range of robotics and image generation models, giving state-of-the-art sampling speeds of 0.2s on 100-step DiffusionPolicy and 14.6s on 1000-step StableDiffusion-v2 with no measurable degradation of task reward, FID score, or CLIP score.

LGSep 27, 2021
Learning Multimodal Rewards from Rankings

Vivek Myers, Erdem Bıyık, Nima Anari et al.

Learning from human feedback has shown to be a useful approach in acquiring robot reward functions. However, expert feedback is often assumed to be drawn from an underlying unimodal reward function. This assumption does not always hold including in settings where multiple experts provide data or when a single expert provides data for different tasks -- we thus go beyond learning a unimodal reward and focus on learning a multimodal reward function. We formulate the multimodal reward learning as a mixture learning problem and develop a novel ranking-based learning approach, where the experts are only required to rank a given set of trajectories. Furthermore, as access to interaction data is often expensive in robotics, we develop an active querying approach to accelerate the learning process. We conduct experiments and user studies using a multi-task variant of OpenAI's LunarLander and a real Fetch robot, where we collect data from multiple users with different preferences. The results suggest that our approach can efficiently learn multimodal reward functions, and improve data-efficiency over benchmark methods that we adapt to our learning problem.

LGFeb 10, 2021
From Sampling to Optimization on Discrete Domains with Applications to Determinant Maximization

Nima Anari, Thuy-Duong Vuong

We show a connection between sampling and optimization on discrete domains. For a family of distributions $μ$ defined on size $k$ subsets of a ground set of elements that is closed under external fields, we show that rapid mixing of natural local random walks implies the existence of simple approximation algorithms to find $\max μ(\cdot)$. More precisely we show that if (multi-step) down-up random walks have spectral gap at least inverse polynomially large in $k$, then (multi-step) local search can find $\max μ(\cdot)$ within a factor of $k^{O(k)}$. As the main application of our result, we show a simple nearly-optimal $k^{O(k)}$-factor approximation algorithm for MAP inference on nonsymmetric DPPs. This is the first nontrivial multiplicative approximation for finding the largest size $k$ principal minor of a square (not-necessarily-symmetric) matrix $L$ with $L+L^\intercal\succeq 0$. We establish the connection between sampling and optimization by showing that an exchange inequality, a concept rooted in discrete convex analysis, can be derived from fast mixing of local random walks. We further connect exchange inequalities with composable core-sets for optimization, generalizing recent results on composable core-sets for DPP maximization to arbitrary distributions that satisfy either the strongly Rayleigh property or that have a log-concave generating polynomial.

DSNov 5, 2020
Instance Based Approximations to Profile Maximum Likelihood

Nima Anari, Moses Charikar, Kirankumar Shiragur et al.

In this paper we provide a new efficient algorithm for approximately computing the profile maximum likelihood (PML) distribution, a prominent quantity in symmetric property estimation. We provide an algorithm which matches the previous best known efficient algorithms for computing approximate PML distributions and improves when the number of distinct observed frequencies in the given instance is small. We achieve this result by exploiting new sparsity structure in approximate PML distributions and providing a new matrix rounding algorithm, of independent interest. Leveraging this result, we obtain the first provable computationally efficient implementation of PseudoPML, a general framework for estimating a broad class of symmetric properties. Additionally, we obtain efficient PML-based estimators for distributions with small profile entropy, a natural instance-based complexity measure. Further, we provide a simpler and more practical PseudoPML implementation that matches the best-known theoretical guarantees of such an estimator and evaluate this method empirically.

DSApr 6, 2020
The Bethe and Sinkhorn Permanents of Low Rank Matrices and Implications for Profile Maximum Likelihood

Nima Anari, Moses Charikar, Kirankumar Shiragur et al.

In this paper we consider the problem of computing the likelihood of the profile of a discrete distribution, i.e., the probability of observing the multiset of element frequencies, and computing a profile maximum likelihood (PML) distribution, i.e., a distribution with the maximum profile likelihood. For each problem we provide polynomial time algorithms that given $n$ i.i.d.\ samples from a discrete distribution, achieve an approximation factor of $\exp\left(-O(\sqrt{n} \log n) \right)$, improving upon the previous best-known bound achievable in polynomial time of $\exp(-O(n^{2/3} \log n))$ (Charikar, Shiragur and Sidford, 2019). Through the work of Acharya, Das, Orlitsky and Suresh (2016), this implies a polynomial time universal estimator for symmetric properties of discrete distributions in a broader range of error parameter. We achieve these results by providing new bounds on the quality of approximation of the Bethe and Sinkhorn permanents (Vontobel, 2012 and 2014). We show that each of these are $\exp(O(k \log(N/k)))$ approximations to the permanent of $N \times N$ matrices with non-negative rank at most $k$, improving upon the previous known bounds of $\exp(O(N))$. To obtain our results on PML, we exploit the fact that the PML objective is proportional to the permanent of a certain Vandermonde matrix with $\sqrt{n}$ distinct columns, i.e. with non-negative rank at most $\sqrt{n}$. As a by-product of our work we establish a surprising connection between the convex relaxation in prior work (CSS19) and the well-studied Bethe and Sinkhorn approximations.

LGJun 19, 2019
Batch Active Learning Using Determinantal Point Processes

Erdem Bıyık, Kenneth Wang, Nima Anari et al.

Data collection and labeling is one of the main challenges in employing machine learning algorithms in a variety of real-world applications with limited data. While active learning methods attempt to tackle this issue by labeling only the data samples that give high information, they generally suffer from large computational costs and are impractical in settings where data can be collected in parallel. Batch active learning methods attempt to overcome this computational burden by querying batches of samples at a time. To avoid redundancy between samples, previous works rely on some ad hoc combination of sample quality and diversity. In this paper, we present a new principled batch active learning method using Determinantal Point Processes, a repulsive point process that enables generating diverse batches of samples. We develop tractable algorithms to approximate the mode of a DPP distribution, and provide theoretical guarantees on the degree of approximation. We further demonstrate that an iterative greedy method for DPP maximization, which has lower computational costs but worse theoretical guarantees, still gives competitive results for batch active learning. Our experiments show the value of our methods on several datasets against state-of-the-art baselines.

DSOct 28, 2018
Smoothed Analysis of Discrete Tensor Decomposition and Assemblies of Neurons

Nima Anari, Constantinos Daskalakis, Wolfgang Maass et al.

We analyze linear independence of rank one tensors produced by tensor powers of randomly perturbed vectors. This enables efficient decomposition of sums of high-order tensors. Our analysis builds upon [BCMV14] but allows for a wider range of perturbation models, including discrete ones. We give an application to recovering assemblies of neurons. Assemblies are large sets of neurons representing specific memories or concepts. The size of the intersection of two assemblies has been shown in experiments to represent the extent to which these memories co-occur or these concepts are related; the phenomenon is called association of assemblies. This suggests that an animal's memory is a complex web of associations, and poses the problem of recovering this representation from cognitive data. Motivated by this problem, we study the following more general question: Can we reconstruct the Venn diagram of a family of sets, given the sizes of their $\ell$-wise intersections? We show that as long as the family of sets is randomly perturbed, it is enough for the number of measurements to be polynomially larger than the number of nonempty regions of the Venn diagram to fully reconstruct the diagram.

LGFeb 16, 2016
Monte Carlo Markov Chain Algorithms for Sampling Strongly Rayleigh Distributions and Determinantal Point Processes

Nima Anari, Shayan Oveis Gharan, Alireza Rezaei

Strongly Rayleigh distributions are natural generalizations of product and determinantal probability distributions and satisfy strongest form of negative dependence properties. We show that the "natural" Monte Carlo Markov Chain (MCMC) is rapidly mixing in the support of a {\em homogeneous} strongly Rayleigh distribution. As a byproduct, our proof implies Markov chains can be used to efficiently generate approximate samples of a $k$-determinantal point process. This answers an open question raised by Deshpande and Rademacher.