Kuikui Liu

2papers

2 Papers

LGNov 7, 2023
Practical Performance Guarantees for Pipelined DNN Inference

Aaron Archer, Matthew Fahrbach, Kuikui Liu et al.

We optimize pipeline parallelism for deep neural network (DNN) inference by partitioning model graphs into $k$ stages and minimizing the running time of the bottleneck stage, including communication. We give practical and effective algorithms for this NP-hard problem, but our emphasis is on tackling the practitioner's dilemma of deciding when a solution is good enough. To this end, we design novel mixed-integer programming (MIP) relaxations for proving lower bounds. Applying these methods to a diverse testbed of 369 production models, for $k \in \{2, 4, 8, 16, 32, 64\}$, we empirically show that these lower bounds are strong enough to be useful in practice. Our lower bounds are substantially stronger than standard combinatorial bounds. For example, evaluated via geometric means across a production testbed with $k = 16$ pipeline stages, our MIP formulations raise the lower bound from 0.4598 to 0.9452, expressed as a fraction of the best partition found. In other words, our improved lower bounds close the optimality gap by a factor of 9.855x.

76.9DSApr 2
Subquadratic Counting via Perfect Marginal Sampling

Xiaoyu Chen, Zongchen Chen, Kuikui Liu et al.

We study the computational complexity of approximately computing the partition function of a spin system. Techniques based on standard counting-to-sampling reductions yield $\tilde{O}(n^2)$-time algorithms, where $n$ is the size of the input graph. We present new counting algorithms that break the quadratic-time barrier in a wide range of settings. For example, for the hardcore model of $λ$-weighted independent sets in graphs of maximum degree $Δ$, we obtain a $\tilde{O}(n^{2-δ})$-time approximate counting algorithm, for some constant $δ> 0$, when the fugacity $λ< \frac{1}{Δ-1}$, improving over the previous regime of $λ= o(Δ^{-3/2})$ by Anand, Feng, Freifeld, Guo, and Wang (2025). Our results apply broadly to many other spin systems, such as the Ising model, hypergraph independent sets, and vertex colorings. Interestingly, our work reveals a deep connection between $\textit{subquadratic}$ counting and $\textit{perfect}$ marginal sampling. For two-spin systems such as the hardcore and Ising models, we show that the existence of perfect marginal samplers directly yields subquadratic counting algorithms in a $\textit{black-box}$ fashion. For general spin systems, we show that almost all existing perfect marginal samplers can be adapted to produce a sufficiently low-variance marginal estimator in sublinear time, leading to subquadratic counting algorithms.