Thomas Humphries

CR
h-index7
8papers
84citations
Novelty56%
AI Score46

8 Papers

58.0LGMay 29
PE-means: Improved Differentially Private $k$-means Clustering through Private Evolution

Thomas Humphries, Zinan Lin, Sergey Yekhanin

We study the problem of differentially private (DP) $k$-means clustering in Euclidean space. Previous solutions rely on summing the private data directly, which induces a sensitivity proportional to the domain. We introduce PE-means, an extension of the private evolution (PE) algorithm (an increasingly popular method for synthetic data generation), to the problem of $k$-means clustering. The key advantage of PE is that it only computes a private histogram with constant sensitivity to guide the evolution. Our adaptation of PE includes new evolutionary operators for clustering, as well as other algorithmic improvements of independent interest. Overall, PE-means achieves an average improvement of 20% in clustering loss over state-of-the-art baselines.

CRJun 14, 2023
Fast and Private Inference of Deep Neural Networks by Co-designing Activation Functions

Abdulrahman Diaa, Lucas Fenaux, Thomas Humphries et al.

Machine Learning as a Service (MLaaS) is an increasingly popular design where a company with abundant computing resources trains a deep neural network and offers query access for tasks like image classification. The challenge with this design is that MLaaS requires the client to reveal their potentially sensitive queries to the company hosting the model. Multi-party computation (MPC) protects the client's data by allowing encrypted inferences. However, current approaches suffer from prohibitively large inference times. The inference time bottleneck in MPC is the evaluation of non-linear layers such as ReLU activation functions. Motivated by the success of previous work co-designing machine learning and MPC, we develop an activation function co-design. We replace all ReLUs with a polynomial approximation and evaluate them with single-round MPC protocols, which give state-of-the-art inference times in wide-area networks. Furthermore, to address the accuracy issues previously encountered with polynomial activations, we propose a novel training algorithm that gives accuracy competitive with plaintext models. Our evaluation shows between $3$ and $110\times$ speedups in inference time on large models with up to $23$ million parameters while maintaining competitive inference accuracy.

24.1CRApr 8
Interpreting the Error of Differentially Private Median Queries through Randomization Intervals

Thomas Humphries, Tim Li, Shufan Zhang et al.

It can be difficult for practitioners to interpret the quality of differentially private (DP) statistics due to the added noise. One method to help analysts understand the amount of error introduced by DP is to return a Randomization Interval (RI), along with the statistic. A RI is a type of confidence interval that bounds the error introduced by DP. For queries where the noise distribution depends on the input, such as the median, prior work degrades the quality of the median itself to obtain a high-quality RI. In this work, we propose PostRI, a solution to compute a RI after the median has been estimated. PostRI enables a median estimation with 14%-850% higher utility than related work, while maintaining a narrow RI.

CRMay 3, 2024
FastLloyd: Federated, Accurate, Secure, and Tunable $k$-Means Clustering with Differential Privacy

Abdulrahman Diaa, Thomas Humphries, Florian Kerschbaum

We study the problem of privacy-preserving $k$-means clustering in the horizontally federated setting. Existing federated approaches using secure computation suffer from substantial overheads and do not offer output privacy. At the same time, differentially private (DP) $k$-means algorithms either assume a trusted central curator or significantly degrade utility by adding noise in the local DP model. Naively combining the secure and central DP solutions results in a protocol with impractical overhead. Instead, our work provides enhancements to both the DP and secure computation components, resulting in a design that is faster, more private, and more accurate than previous work. By utilizing the computational DP model, we design a lightweight, secure aggregation-based approach that achieves five orders of magnitude speed-up over state-of-the-art related work. Furthermore, we not only maintain the utility of the state-of-the-art in the central model of DP, but we improve the utility further by designing a new DP clustering mechanism.

IVDec 23, 2021
Self-Attention Generative Adversarial Network for Iterative Reconstruction of CT Images

Ruiwen Xing, Thomas Humphries, Dong Si

Computed tomography (CT) uses X-ray measurements taken from sensors around the body to generate tomographic images of the human body. Conventional reconstruction algorithms can be used if the X-ray data are adequately sampled and of high quality; however, concerns such as reducing dose to the patient, or geometric limitations on data acquisition, may result in low quality or incomplete data. Images reconstructed from these data using conventional methods are of poor quality, due to noise and other artifacts. The aim of this study is to train a single neural network to reconstruct high-quality CT images from noisy or incomplete CT scan data, including low-dose, sparse-view, and limited-angle scenarios. To accomplish this task, we train a generative adversarial network (GAN) as a signal prior, to be used in conjunction with the iterative simultaneous algebraic reconstruction technique (SART) for CT data. The network includes a self-attention block to model long-range dependencies in the data. We compare our Self-Attention GAN for CT image reconstruction with several state-of-the-art approaches, including denoising cycle GAN, CIRCLE GAN, and a total variation superiorized algorithm. Our approach is shown to have comparable overall performance to CIRCLE GAN, while outperforming the other two approaches.

CRJul 26, 2021
Selective MPC: Distributed Computation of Differentially Private Key-Value Statistics

Thomas Humphries, Rasoul Akhavan Mahdavi, Shannon Veitch et al.

Key-value data is a naturally occurring data type that has not been thoroughly investigated in the local trust model. Existing local differentially private (LDP) solutions for computing statistics over key-value data suffer from the inherent accuracy limitations of each user adding their own noise. Multi-party computation (MPC) maintains better accuracy than LDP and similarly does not require a trusted central party. However, naively applying MPC to key-value data results in prohibitively expensive computation costs. In this work, we present selective multi-party computation, a novel approach to distributed computation that leverages DP leakage to efficiently and accurately compute statistics over key-value data. By providing each party with a view of a random subset of the data, we can capture subtractive noise. We prove that our protocol satisfies pure DP and is provably secure in the combined DP/MPC model. Our empirical evaluation demonstrates that we can compute statistics over 10,000 keys in 20 seconds and can scale up to 30 servers while obtaining results for a single key in under a second.

CROct 23, 2020
Investigating Membership Inference Attacks under Data Dependencies

Thomas Humphries, Simon Oya, Lindsey Tulloch et al.

Training machine learning models on privacy-sensitive data has become a popular practice, driving innovation in ever-expanding fields. This has opened the door to new attacks that can have serious privacy implications. One such attack, the Membership Inference Attack (MIA), exposes whether or not a particular data point was used to train a model. A growing body of literature uses Differentially Private (DP) training algorithms as a defence against such attacks. However, these works evaluate the defence under the restrictive assumption that all members of the training set, as well as non-members, are independent and identically distributed. This assumption does not hold for many real-world use cases in the literature. Motivated by this, we evaluate membership inference with statistical dependencies among samples and explain why DP does not provide meaningful protection (the privacy parameter $ε$ scales with the training set size $n$) in this more general case. We conduct a series of empirical evaluations with off-the-shelf MIAs using training sets built from real-world data showing different types of dependencies among samples. Our results reveal that training set dependencies can severely increase the performance of MIAs, and therefore assuming that data samples are statistically independent can significantly underestimate the performance of MIAs.

NAMay 12, 2015
Convergence analysis of a polyenergetic SART algorithm

Thomas Humphries

Purpose: We analyze a recently proposed polyenergetic version of the simultaneous algebraic reconstruction technique (SART). This algorithm, denoted pSART, replaces the monoenergetic forward projection operation used by SART with a post-log, polyenergetic forward projection, while leaving the rest of the algorithm unchanged. While the proposed algorithm provides good results empirically, convergence of the algorithm was not established mathematically in the original paper. Methods: We analyze pSART as a nonlinear fixed point iteration by explicitly computing the Jacobian of the iteration. A necessary condition for convergence is that the spectral radius of the Jacobian, evaluated at the fixed point, is less than one. A short proof of convergence for SART is also provided as a basis for comparison. Results: We show that the pSART algorithm is not guaranteed to converge, in general. The Jacobian of the iteration depends on several factors, including the system matrix and how one models the energy dependence of the linear attenuation coefficient. We provide a simple numerical example that shows that the spectral radius of the Jacobian matrix is not guaranteed to be less than one. A second set of numerical experiments using realistic CT system matrices, however, indicates that conditions for convergence are likely to be satisfied in practice. Conclusion: Although pSART is not mathematically guaranteed to converge, our numerical experiments indicate that it will tend to converge at roughly the same rate as SART for system matrices of the type encountered in CT imaging. Thus we conclude that the algorithm is still a useful method for reconstruction of polyenergetic CT data.