16.8DSApr 28
An Efficient Streaming Algorithm for Approximating Graphlet DistributionsMarco Bressan, T-H. Hubert Chan, Qipeng Kuang et al.
In recent years, the problem of computing the frequencies of the induced $k$-vertex subgraphs of a graph, or \emph{$k$-graphlets}, has become central. One approach for this problem is to sample $k$-graphlets randomly. Classic algorithms for $k$-graphlet sampling require loading the entire graph into main memory, making them impractical for massive graphs. To bypass this limitation, Bourreau et al. (NeurIPS 2024) introduced a \emph{streaming} algorithm that through nontrivial techniques makes only $O(\log n)$ passes using $O(n \log n)$ memory. In this work we break their $O(\log n)$-pass bound by giving an algorithm that, for any fixed $c>0$, makes $O(1/c)$ passes using $\tilde O(n^{1+c})$ memory. As a consequence of their lower bound, our algorithm is optimal up to a factor of $\tilde{O}(n^c)$ in the memory usage. We use this sampling algorithm to obtain an efficient method of approximating $k$-graphlet distributions. Experiments on real-world and synthetic graphs show that our algorithm is always at least as good as the one of Bourreau et al., and outperforms it by orders of magnitude on mildly dense graphs.
LGDec 14, 2023
Privacy Amplification by Iteration for ADMM with (Strongly) Convex Objective FunctionsT-H. Hubert Chan, Hao Xie, Mengshi Zhao
We examine a private ADMM variant for (strongly) convex objectives which is a primal-dual iterative method. Each iteration has a user with a private function used to update the primal variable, masked by Gaussian noise for local privacy, without directly adding noise to the dual variable. Privacy amplification by iteration explores if noises from later iterations can enhance the privacy guarantee when releasing final variables after the last iteration. Cyffers et al. [ICML 2023] explored privacy amplification by iteration for the proximal ADMM variant, where a user's entire private function is accessed and noise is added to the primal variable. In contrast, we examine a private ADMM variant requiring just one gradient access to a user's function, but both primal and dual variables must be passed between successive iterations. To apply Balle et al.'s [NeurIPS 2019] coupling framework to the gradient ADMM variant, we tackle technical challenges with novel ideas. First, we address the non-expansive mapping issue in ADMM iterations by using a customized norm. Second, because the dual variables are not masked with any noise directly, their privacy guarantees are achieved by treating two consecutive noisy ADMM iterations as a Markov operator. Our main result is that the privacy guarantee for the gradient ADMM variant can be amplified proportionally to the number of iterations. For strongly convex objective functions, this amplification exponentially increases with the number of iterations. These amplification results align with the previously studied special case of stochastic gradient descent.
CRDec 7, 2021
Locally Differentially Private Sparse Vector AggregationMingxun Zhou, Tianhao Wang, T-H. Hubert Chan et al.
Vector mean estimation is a central primitive in federated analytics. In vector mean estimation, each user $i \in [n]$ holds a real-valued vector $v_i\in [-1, 1]^d$, and a server wants to estimate the mean of all $n$ vectors. Not only so, we would like to protect each individual user's privacy. In this paper, we consider the $k$-sparse version of the vector mean estimation problem, that is, suppose that each user's vector has at most $k$ non-zero coordinates in its $d$-dimensional vector, and moreover, $k \ll d$. In practice, since the universe size $d$ can be very large (e.g., the space of all possible URLs), we would like the per-user communication to be succinct, i.e., independent of or (poly-)logarithmic in the universe size. In this paper, we are the first to show matching upper- and lower-bounds for the $k$-sparse vector mean estimation problem under local differential privacy. Specifically, we construct new mechanisms that achieve asymptotically optimal error as well as succinct communication, either under user-level-LDP or event-level-LDP. We implement our algorithms and evaluate them on synthetic as well as real-world datasets. Our experiments show that we can often achieve one or two orders of magnitude reduction in error in comparison with prior works under typical choices of parameters, while incurring insignificant communication cost.
DSAug 4, 2020
Bucket Oblivious Sort: An Extremely Simple Oblivious SortGilad Asharov, T-H. Hubert Chan, Kartik Nayak et al.
We propose a conceptually simple oblivious sort and oblivious random permutation algorithms called bucket oblivious sort and bucket oblivious random permutation. Bucket oblivious sort uses $6n\log n$ time (measured by the number of memory accesses) and $2Z$ client storage with an error probability exponentially small in $Z$. The above runtime is only $3\times$ slower than a non-oblivious merge sort baseline; for $2^{30}$ elements, it is $5\times$ faster than bitonic sort, the de facto oblivious sorting algorithm in practical implementations.
CRSep 4, 2018
More is Less: Perfectly Secure Oblivious Algorithms in the Multi-Server SettingT-H. Hubert Chan, Jonathan Katz, Kartik Nayak et al.
The problem of Oblivious RAM (ORAM) has traditionally been studied in a single-server setting, but more recently the multi-server setting has also been considered. Yet it is still unclear whether the multi-server setting has any inherent advantages, e.g., whether the multi-server setting can be used to achieve stronger security goals or provably better efficiency than is possible in the single-server case. In this work, we construct a perfectly secure 3-server ORAM scheme that outperforms the best known single-server scheme by a logarithmic factor. In the process, we also show, for the first time, that there exist specific algorithms for which multiple servers can overcome known lower bounds in the single-server setting.
CRFeb 23, 2012
Path ORAM: An Extremely Simple Oblivious RAM ProtocolEmil Stefanov, Marten van Dijk, Elaine Shi et al.
We present Path ORAM, an extremely simple Oblivious RAM protocol with a small amount of client storage. Partly due to its simplicity, Path ORAM is the most practical ORAM scheme known to date with small client storage. We formally prove that Path ORAM has a O(log N) bandwidth cost for blocks of size B = Omega(log^2 N) bits. For such block sizes, Path ORAM is asymptotically better than the best known ORAM schemes with small client storage. Due to its practicality, Path ORAM has been adopted in the design of secure processors since its proposal.