DSMay 5
Nearly-Tight Bounds for Zonotope Containment and BeyondFriedrich Eisenbrand, Thomas Rothvoss, Matteo Russo et al.
We investigate the convex-body containment problem $\max\{s >0 : s Z \subseteq Q\}$, where the outer body $Q \subseteq \mathbb R^d$ is described by a membership oracle and the inner body $Z \subseteq \mathbb R^d$ is a zonotope. Our main result is a sampling-based $O(\sqrt{d})$-approximation algorithm for this problem that almost matches the lower bound of $Ω(\sqrt{d/\log d})$ by Khot and Naor in the oracle model. Assuming zonotopes can be sparsified by a linear number of generators, which is referred to as Talagrand conjecture, our approach attains the optimal approximation factor of $Θ(\sqrt{d/\log d})$. Our second main result is a proof of Talagrand's conjecture for $Δ$-modular zonotopes whenever $Δ$ is constant. Those zonotopes are of the form $Z = \{ Wx \colon \| x\|_\infty \leq 1\}$ where the non-zero $d \times d$ sub-determinants of $W$ are between $1$ and $Δ$. This result establishes a connection between zonoid sparsification and spectral sparsification of Batson, Spielman and Srivastava. We complement these results with a universal $Ω(\sqrt{d/\log d})$ lower bound holding for all zonotopes. Finally, we consider containment problems $\max\{s >0 : s K \subseteq Q\}$, for general convex bodies $K \subseteq \mathbb R^d$. A result of Naszódi on approximating $K \subseteq \mathbb R^d$ by a polytope implies a $Θ(d/\log d)$ approximation algorithm in polynomial time. We show the tightness of this approximation factor in the oracle model via a reduction to the circumradius computation. Our lower bound holds for centrally symmetric convex sets, implying that Barvinok's optimal $O(\sqrt{d})$-approximation of a centrally symmetric convex body by a polytope with a polynomial number of vertices cannot be computed in polynomial time.
OCMar 27
The Subspace Flatness Conjecture and Faster Integer ProgrammingVictor Reis, Thomas Rothvoss
In a seminal paper, Kannan and Lovász (1988) considered a quantity $μ_{KL}(Î,K)$ which denotes the best volume-based lower bound on the covering radius $μ(Î,K)$ of a convex body $K$ with respect to a lattice $Î$. Kannan and Lovász proved that $μ(Î,K) \leq n \cdot μ_{KL}(Î,K)$ and the Subspace Flatness Conjecture by Dadush (2012) claims a $O(\log(2n))$ factor suffices, which would match the lower bound from the work of Kannan and Lovász. We settle this conjecture up to a constant in the exponent by proving that $μ(Î,K) \leq O(\log^{3}(2n)) \cdot μ_{KL} (Î,K)$. Our proof is based on the Reverse Minkowski Theorem due to Regev and Stephens-Davidowitz (2017). Following the work of Dadush (2012, 2019), we obtain a $(\log(2n))^{O(n)}$-time randomized algorithm to solve integer programs in $n$ variables. Another implication of our main result is a near-optimal flatness constant of $O(n \log^{2}(2n))$, improving on the previous bound of $O(n^{4/3} \log^{O(1)} (2n))$.
LGOct 12, 2023
Stronger Coreset Bounds for Kernel Density Estimators via ChainingRainie Bozzai, Thomas Rothvoss
We apply the discrepancy method and a chaining approach to give improved bounds on the coreset complexity of a wide class of kernel functions. Our results give randomized polynomial time algorithms to produce coresets of size $O\big(\frac{\sqrt{d}}{\varepsilon}\sqrt{\log\log \frac{1}{\varepsilon}}\big)$ for the Gaussian and Laplacian kernels in the case that the data set is uniformly bounded, an improvement that was not possible with previous techniques. We also obtain coresets of size $O\big(\frac{1}{\varepsilon}\sqrt{\log\log \frac{1}{\varepsilon}}\big)$ for the Laplacian kernel for $d$ constant. Finally, we give the best known bounds of $O\big(\frac{\sqrt{d}}{\varepsilon}\sqrt{\log(2\max\{1,α\})}\big)$ on the coreset complexity of the exponential, Hellinger, and JS Kernels, where $1/α$ is the bandwidth parameter of the kernel.
LGJan 11, 2025Code
DiscQuant: A Quantization Method for Neural Networks Inspired by Discrepancy TheoryJerry Chee, Arturs Backurs, Rainie Heck et al.
Quantizing the weights of a neural network has two steps: (1) Finding a good low bit-complexity representation for weights (which we call the quantization grid) and (2) Rounding the original weights to values in the quantization grid. In this paper, we study the problem of rounding optimally given any quantization grid. The simplest and most commonly used way to round is Round-to-Nearest (RTN). By rounding in a data-dependent way instead, one can improve the quality of the quantized model significantly. We study the rounding problem from the lens of \emph{discrepancy theory}, which studies how well we can round a continuous solution to a discrete solution without affecting solution quality too much. We prove that given $m=\mathrm{poly}(1/ε)$ samples from the data distribution, we can round all but $O(m)$ model weights such that the expected approximation error of the quantized model on the true data distribution is $\le ε$ as long as the space of gradients of the original model is approximately low rank (which we empirically validate). Our proof, which is algorithmic, inspired a simple and practical rounding algorithm called \emph{DiscQuant}. In our experiments, we demonstrate that DiscQuant significantly improves over the prior state-of-the-art rounding method called GPTQ and the baseline RTN over a range of benchmarks on Phi3mini-3.8B and Llama3.1-8B. For example, rounding Phi3mini-3.8B to a fixed quantization grid with 3.25 bits per parameter using DiscQuant gets 64\% accuracy on the GSM8k dataset, whereas GPTQ achieves 54\% and RTN achieves 31\% (the original model achieves 84\%). We make our code available at https://github.com/jerry-chee/DiscQuant.