Philip Bille

DSMar 26

Differentially Private Substring and Document Counting with Near-Optimal Error

Giulia Bernardini, Philip Bille, Inge Li Gørtz et al.

For databases consisting of many text documents, one of the most fundamental data analysis tasks is counting (i) how often a pattern appears as a substring in the database (substring counting) and (ii) how many documents in the collection contain the pattern as a substring (document counting). If such a database contains sensitive data, it is crucial to protect the privacy of individuals in the database. Differential privacy is the gold standard for privacy in data analysis. It gives rigorous privacy guarantees, but comes at the cost of yielding less accurate results. In this paper, we carry out a theoretical study of substring and document counting under differential privacy. We propose a data structure storing $Îµ$-differentially private counts for all possible query patterns with a maximum additive error of $O(\ell\cdot\mathrm{polylog}(n\ell|Î£|))$, where $\ell$ is the maximum length of a document in the database, $n$ is the number of documents, and $|Î£|$ is the size of the alphabet. We also improve the error bound for document counting with $(Îµ, Î´)$-differential privacy to $O(\sqrt{\ell}\cdot\mathrm{polylog}(n\ell|Î£|))$. We show that our additive errors for substring counting and document counting are optimal up to an $O(\mathrm{polylog}(n\ell))$ factor both for $Îµ$-differential privacy and $(Îµ, Î´)$-differential privacy. Our data structures immediately lead to improved algorithms for related problems, such as privately mining frequent substrings and q-grams. Additionally, we develop a new technique of independent interest for differentially privately computing a general class of counting functions on trees.

CVMay 13

Fast and Compact Graph Cuts for the Boykov-Kolmogorov Algorithm

Christian Møller Mikkelstrup, Anders Bjorholm Dahl, Philip Bille et al.

Computing a minimum $s$-$t$ cut in a graph is a solution to a wide range of computer vision problems, and is often done using the Boykov-Kolmogorov (BK) algorithm. In this paper, we revisit the BK algorithm from both a theoretical and practical point of view. We improve the analysis of the time complexity of the BK algorithm to $O(mn|C|)$ and propose a new algorithm, the fast and compact BK (fcBK) algorithm, with a time complexity of $O(m|C|)$, where $m$, $n$, and $|C|$ are the number of edges, number of vertices, and the capacity of the cut, respectively. We additionally propose a compact graph representation that allows our implementation to find a minimum $s$-$t$ cut in a graph with upwards of $10^9$ vertices and $10^{10}$ edges on a machine with 128 GB of memory. We find our implementation of the BK algorithm to be the fastest available implementation of the BK algorithm when evaluating on a comprehensive set of benchmark datasets, highlighting the importance of memory-efficient implementations. We make our implementations publicly available for further research and implementation development within minimum $s$-$t$ cut algorithms.

Philip Bille

2 Papers