David G. Harris

4papers

Novelty59%

AI Score47

Ranked #54,652 of 205,806 authors (top 27%)#157 in DS (top 27%)

4 Papers

62.1DSMay 24

The Dirichlet Mechanism for rounding with strong negative correlation, with applications

David G. Harris, George Z. Li, Nitya Raju et al.

Many optimization and scheduling problems can be abstracted in terms of a bipartite ``assignment graph" $G = (L \cup R, E)$, where the goal is to select exactly one edge for each right-node. For example, a right-node may correspond to a job, and a left-node to a possible machine assignment. A common strategy to solve such problems is to obtain a fractional relaxation $x_e$ for each edge $e$, and then have each right-node independently select an edge with probability $x_e$. However, this may cause the left-nodes to become unevenly loaded, leading to suboptimal solutions for some problems. To address this, a number of algorithms for dependent rounding with strong negative correlation have been developed, e.g. Bansal, Srinivasan & Svensson (2021), Im & Shadloo (2020), Im & Li (2023), Harris (2024), Naor, Srinivasan & Wajc (2025). We introduce a new method for this, which we call the \emph{Dirichlet mechanism}. It is based on having each left-node draw Dirichlet random variables for its edges, and then having each right-node select an edge based on these values. This achieves quantitatively stronger negative correlation than previous algorithms, and is also simpler since it avoids the need for a tie-breaking mechanism. We illustrate the mechanism with improved approximation ratios for two problems. For oblivious online dependent rounding, we achieve a $0.68$-approximation which improves upon the previous $0.652$-approximation of Naor, Srinivasan & Wajc (2025). For the problem of scheduling jobs on unrelated machines to minimize weighted completion time, we achieve a $1.387$-approximation which improves upon the $1.398$-approximation of Harris (2024). (A recent algorithm of Li (2025) based on iterated rounding also provides a $1.36$-approximation if the weights of each job are independent of machine.)

78.3PRApr 3

Simple parallel estimation of the partition ratio for Gibbs distributions

David G. Harris, Vladimir Kolmogorov

We consider the problem of estimating the partition function $Z(Î²)=\sum_x \exp(Î²(H(x))$ of a Gibbs distribution with the Hamiltonian $H:Î©\rightarrow\{0\}\cup[1,n]$. As shown in [Harris & Kolmogorov 2024], the log-ratio $q=\ln (Z(Î²_{\max})/Z(Î²_{\min}))$ can be estimated with accuracy $Îµ$ using $O(\frac{q \log n}{Îµ^2})$ calls to an oracle that produces a sample from the Gibbs distribution for parameter $Î²\in[Î²_{\min},Î²_{\max}]$. That algorithm is inherently sequential, or {\em adaptive}: the queried values of $Î²$ depend on previous samples. Recently, [Liu, Yin & Zhang 2024] developed a non-adaptive version that needs $O( q (\log^2 n) (\log q + \log \log n + Îµ^{-2}) )$ samples. We improve the number of samples to $O(\frac{q \log^2 n}{Îµ^2})$ for a non-adaptive algorithm, and to $O(\frac{q \log n}{Îµ^2})$ for an algorithm that uses just two rounds of adaptivity (matching the complexity of the sequential version). Furthermore, our algorithm simplifies previous techniques. In particular, we use just a single estimator, whereas methods in [Harris & Kolmogorov 2024, Liu, Yin & Zhang 2024] employ two different estimators for different regimes.

61.7DSApr 1

Near-Optimal Parallel Approximate Counting via Sampling

David G. Harris, Vladimir Kolmogorov, Hongyang Liu et al.

The computational equivalence between approximate counting and sampling is well established for polynomial-time algorithms. The most efficient general reduction from counting to sampling is achieved via simulated annealing, where the counting problem is formulated in terms of estimating the ratio $Q={Z(Î²_{\max})}/{Z(Î²_{\min})}$ between partition functions $Z(Î²)=\sum_{x\in Î©} \exp(Î²H(x))$ of Gibbs distributions $Î¼_Î²$ over $Î©$ with Hamiltonian $H$, given access to a sampling oracle that produces samples from $Î¼_Î²$ for $Î²\in [Î²_{\min}, Î²_{\max}]$. The best bound achieved by known annealing algorithms with relative error $\varepsilon$ is $O(q \log h / \varepsilon^2)$, where $q, h$ are parameters which respectively bound $\ln Q$ and $H$. However, all known algorithms attaining this near-optimal complexity are inherently sequential, or *adaptive*: the queried parameters $Î²$ depend on previous samples. We develop a simple non-adaptive algorithm for approximate counting using $O(q \log^2 h / \varepsilon^2)$ samples, as well as an algorithm that achieves $O(q \log h / \varepsilon^2)$ samples with just two rounds of adaptivity, matching the best sample complexity of sequential algorithms. These algorithms naturally give rise to work-efficient parallel (RNC) counting algorithms. We discuss applications to RNC counting algorithms for several classic models, including the anti-ferromagnetic 2-spin, monomer-dimer and ferromagnetic Ising models.

DSJan 15

Scalable Algorithms for Approximate DNF Model Counting

Paul Burkhardt, David G. Harris, Kevin T Schmitt

Model counting of Disjunctive Normal Form (DNF) formulas is a critical problem in applications such as probabilistic inference and network reliability. For example, it is often used for query evaluation in probabilistic databases. Due to the computational intractability of exact DNF counting, there has been a line of research into a variety of approximation algorithms. These include Monte Carlo approaches such as the classical algorithms of Karp, Luby, and Madras (1989), as well as methods based on hashing (Soos et al. 2023), and heuristic approximations based on Neural Nets (Abboud, Ceylan, and Lukasiewicz 2020). We develop a new Monte Carlo approach with an adaptive stopping rule and short-circuit formula evaluation. We prove it achieves Probably Approximately Correct (PAC) learning bounds and is asymptotically more efficient than the previous methods. We also show experimentally that it out-performs prior algorithms by orders of magnitude, and can scale to much larger problems with millions of variables.