Adam Sealfon

CR
h-index38
5papers
121citations
Novelty54%
AI Score48

5 Papers

LGMar 10
Denoising the US Census: Succinct Block Hierarchical Regression

Badih Ghazi, Pritish Kamath, Ravi Kumar et al.

The US Census Bureau Disclosure Avoidance System (DAS) balances confidentiality and utility requirements for the decennial US Census (Abowd et al., 2022). The DAS was used in the 2020 Census to produce demographic datasets critically used for legislative apportionment and redistricting, federal and state funding allocation, municipal and infrastructure planning, and scientific research. At the heart of DAS is TopDown, a heuristic post-processing method that combines billions of private noisy measurements across six geographic levels in order to produce new estimates that are consistent, more accurate, and satisfy certain structural constraints on the data. In this work, we introduce BlueDown, a new post-processing method that produces more accurate, consistent estimates while satisfying the same privacy guarantees and structural constraints. We obtain especially large accuracy improvements for aggregates at the county and tract levels on evaluation metrics proposed by the US Census Bureau. From a technical perspective, we develop a new algorithm for generalized least-squares regression that leverages the hierarchical structure of the measurements and that is statistically optimal among linear unbiased estimators. This reduces the computational dependence on the number of geographic regions measured from matrix multiplication time, which would be infeasible for census-scale data, to linear time. We incorporate the additional structural constraints by combining this regression algorithm with an optimization routine that extends TDA to support correlated measurements. We further improve the efficiency of our algorithm using succinct linear-algebraic operations that exploit symmetries in the structure of the measurements and constraints. We believe our hierarchical regression and succinct operations to be of independent interest.

CRMay 3
LAPRAS : Learning-Augmented PRivate Answering for linear query Streams

Pranay Mundra, Adam Sealfon, Ziteng Sun et al.

Modern database workloads are highly predictable: query streams are dominated by recurring jobs and templates, even when their arrival order is not known in advance. This motivates a learning-augmented view of online differentially private (DP) analytics: can algorithms utilize predictions about which queries will occur to improve utility under a single global privacy budget, while remaining robust when predictions are wrong? We study online DP query answering, where a curator must answer a stream $Q$ of $S$ linear queries arriving in uniformly random order under privacy budget $(ε,δ)$. We present LAPRAS, which assumes access to an oracle that outputs a prediction set of queries likely to appear in the stream and uses it to guide privacy spending. LAPRAS answers predicted queries using the offline-optimal Matrix Mechanism and answers the remaining queries online from a residual budget. To pace spending across an unknown number of unpredicted queries, we introduce Smooth Allocation, which forms an unbiased stopping-time estimate $\widehat{B}$ from the first $T=Θ(\log^2 S)$ unpredicted queries and continuously recalibrates per-query expenditure. Empirically, over two real datasets, we validate the intended consistency--robustness trade-off: LAPRAS achieves near-offline utility under high overlap and degrades gracefully to baseline-level performance when overlap is low.

LGJun 5, 2025
Urania: Differentially Private Insights into AI Use

Daogao Liu, Edith Cohen, Badih Ghazi et al.

We introduce $Urania$, a novel framework for generating insights about LLM chatbot interactions with rigorous differential privacy (DP) guarantees. The framework employs a private clustering mechanism and innovative keyword extraction methods, including frequency-based, TF-IDF-based, and LLM-guided approaches. By leveraging DP tools such as clustering, partition selection, and histogram-based summarization, $Urania$ provides end-to-end privacy protection. Our evaluation assesses lexical and semantic content preservation, pair similarity, and LLM-based metrics, benchmarking against a non-private Clio-inspired pipeline (Tamkin et al., 2024). Moreover, we develop a simple empirical privacy evaluation that demonstrates the enhanced robustness of our DP pipeline. The results show the framework's ability to extract meaningful conversational insights while maintaining stringent user privacy, effectively balancing data utility with privacy preservation.

DSMay 24, 2019
Efficiently Estimating Erdos-Renyi Graphs with Node Differential Privacy

Adam Sealfon, Jonathan Ullman

We give a simple, computationally efficient, and node-differentially-private algorithm for estimating the parameter of an Erdos-Renyi graph---that is, estimating p in a G(n,p)---with near-optimal accuracy. Our algorithm nearly matches the information-theoretically optimal exponential-time algorithm for the same problem due to Borgs et al. (FOCS 2018). More generally, we give an optimal, computationally efficient, private algorithm for estimating the edge-density of any graph whose degree distribution is concentrated on a small interval.

CRNov 14, 2015
Shortest Paths and Distances with Differential Privacy

Adam Sealfon

We introduce a model for differentially private analysis of weighted graphs in which the graph topology $(V,E)$ is assumed to be public and the private information consists only of the edge weights $w:E\to\mathbb{R}^+$. This can express hiding congestion patterns in a known system of roads. Differential privacy requires that the output of an algorithm provides little advantage, measured by privacy parameters $ε$ and $δ$, for distinguishing between neighboring inputs, which are thought of as inputs that differ on the contribution of one individual. In our model, two weight functions $w,w'$ are considered to be neighboring if they have $\ell_1$ distance at most one. We study the problems of privately releasing a short path between a pair of vertices and of privately releasing approximate distances between all pairs of vertices. We are concerned with the approximation error, the difference between the length of the released path or released distance and the length of the shortest path or actual distance. For privately releasing a short path between a pair of vertices, we prove a lower bound of $Ω(|V|)$ on the additive approximation error for fixed $ε,δ$. We provide a differentially private algorithm that matches this error bound up to a logarithmic factor and releases paths between all pairs of vertices. The approximation error of our algorithm can be bounded by the number of edges on the shortest path, so we achieve better accuracy than the worst-case bound for vertex pairs that are connected by a low-weight path with $o(|V|)$ vertices. For privately releasing all-pairs distances, we show that for trees we can release all distances with approximation error $O(\log^{2.5}|V|)$ for fixed privacy parameters. For arbitrary bounded-weight graphs with edge weights in $[0,M]$ we can release all distances with approximation error $\tilde{O}(\sqrt{|V|M})$.