100.0DSMar 29
An Optimal Algorithm for Stochastic Vertex CoverJan van den Brand, Inge Li Gørtz, Chirag Pabbaraju et al.
The goal in the stochastic vertex cover problem is to obtain an approximately minimum vertex cover for a graph $G^\star$ that is realized by sampling each edge independently with some probability $p\in (0, 1]$ in a base graph $G = (V, E)$. The algorithm is given the base graph $G$ and the probability $p$ as inputs, but its only access to the realized graph $G^\star$ is through queries on individual edges in $G$ that reveal the existence (or not) of the queried edge in $G^\star$. In this paper, we resolve the central open question for this problem: to find a $(1+\varepsilon)$-approximate vertex cover using only $O_\varepsilon(n/p)$ edge queries. Prior to our work, there were two incomparable state-of-the-art results for this problem: a $(3/2+\varepsilon)$-approximation using $O_\varepsilon(n/p)$ queries (Derakhshan, Durvasula, and Haghtalab, 2023) and a $(1+\varepsilon)$-approximation using $O_\varepsilon((n/p)\cdot \mathrm{RS}(n))$ queries (Derakhshan, Saneian, and Xun, 2025), where $\mathrm{RS}(n)$ is known to be at least $2^{Ω\left(\frac{\log n}{\log \log n}\right)}$ and could be as large as $\frac{n}{2^{Î(\log^* n)}}$. Our improved upper bound of $O_{\varepsilon}(n/p)$ matches the known lower bound of $Ω(n/p)$ for any constant-factor approximation algorithm for this problem (Behnezhad, Blum, and Derakhshan, 2022). A key tool in our result is a new concentration bound for the size of minimum vertex cover on random graphs, which might be of independent interest.
32.6DSMar 26
Differentially Private Substring and Document Counting with Near-Optimal ErrorGiulia Bernardini, Philip Bille, Inge Li Gørtz et al.
For databases consisting of many text documents, one of the most fundamental data analysis tasks is counting (i) how often a pattern appears as a substring in the database (substring counting) and (ii) how many documents in the collection contain the pattern as a substring (document counting). If such a database contains sensitive data, it is crucial to protect the privacy of individuals in the database. Differential privacy is the gold standard for privacy in data analysis. It gives rigorous privacy guarantees, but comes at the cost of yielding less accurate results. In this paper, we carry out a theoretical study of substring and document counting under differential privacy. We propose a data structure storing $ε$-differentially private counts for all possible query patterns with a maximum additive error of $O(\ell\cdot\mathrm{polylog}(n\ell|Σ|))$, where $\ell$ is the maximum length of a document in the database, $n$ is the number of documents, and $|Σ|$ is the size of the alphabet. We also improve the error bound for document counting with $(ε, δ)$-differential privacy to $O(\sqrt{\ell}\cdot\mathrm{polylog}(n\ell|Σ|))$. We show that our additive errors for substring counting and document counting are optimal up to an $O(\mathrm{polylog}(n\ell))$ factor both for $ε$-differential privacy and $(ε, δ)$-differential privacy. Our data structures immediately lead to improved algorithms for related problems, such as privately mining frequent substrings and q-grams. Additionally, we develop a new technique of independent interest for differentially privately computing a general class of counting functions on trees.