Mónika Csikós

1.2DSSep 2, 2022

Algorithms for Discrepancy, Matchings, and Approximations: Fast, Simple, and Practical

Mónika Csikós, Nabil H. Mustafa

We study one of the key tools in data approximation and optimization: low-discrepancy colorings. Formally, given a finite set system $(X,\mathcal S)$, the \emph{discrepancy} of a two-coloring $χ:X\to\{-1,1\}$ is defined as $\max_{S \in \mathcal S}|{χ(S)}|$, where $χ(S)=\sum\limits_{x \in S}χ(x)$. We propose a randomized algorithm which, for any $d>0$ and $(X,\mathcal S)$ with dual shatter function $π^*(k)=O(k^d)$, returns a coloring with expected discrepancy $O\left({\sqrt{|X|^{1-1/d}\log|\mathcal S|}}\right)$ (this bound is tight) in time $\tilde O\left({|\mathcal S|\cdot|X|^{1/d}+|X|^{2+1/d}}\right)$, improving upon the previous-best time of $O\left(|\mathcal S|\cdot|X|^3\right)$ by at least a factor of $|X|^{2-1/d}$ when $|\mathcal S|\geq|X|$. This setup includes many geometric classes, families of bounded dual VC-dimension, and others. As an immediate consequence, we obtain an improved algorithm to construct $\varepsilon$-approximations of sub-quadratic size. Our method uses primal-dual reweighing with an improved analysis of randomly updated weights and exploits the structural properties of the set system via matchings with low crossing number -- a fundamental structure in computational geometry. In particular, we get the same $|X|^{2-1/d}$ factor speed-up on the construction time of matchings with crossing number $O\left({|X|^{1-1/d}}\right)$, which is the first improvement since the 1980s. The proposed algorithms are very simple, which makes it possible, for the first time, to compute colorings with near-optimal discrepancies and near-optimal sized approximations for abstract and geometric set systems in dimensions higher than $2$.

3.3LGAug 20, 2020

Optimal Approximations Made Easy

Mónika Csikós, Nabil H. Mustafa

The fundamental result of Li, Long, and Srinivasan on approximations of set systems has become a key tool across several communities such as learning theory, algorithms, computational geometry, combinatorics and data analysis. The goal of this paper is to give a modular, self-contained, intuitive proof of this result for finite set systems. The only ingredient we assume is the standard Chernoff's concentration bound. This makes the proof accessible to a wider audience, readers not familiar with techniques from statistical learning theory, and makes it possible to be covered in a single self-contained lecture in a geometry, algorithms or combinatorics course.

4.7LGJul 20, 2018

Optimal Bounds on the VC-dimension

Monika Csikos, Andrey Kupavskii, Nabil H. Mustafa

The VC-dimension of a set system is a way to capture its complexity and has been a key parameter studied extensively in machine learning and geometry communities. In this paper, we resolve two longstanding open problems on bounding the VC-dimension of two fundamental set systems: $k$-fold unions/intersections of half-spaces, and the simplices set system. Among other implications, it settles an open question in machine learning that was first studied in the 1989 foundational paper of Blumer, Ehrenfeucht, Haussler and Warmuth as well as by Eisenstat and Angluin and Johnson.

Mónika Csikós

3 Papers