Hayim Shaul

5papers

112citations

Novelty56%

AI Score32

Ranked #138,937 of 205,806 authors (top 68%)#3,816 in CR (top 52%)

5 Papers

CROct 30, 2023

Privacy-Preserving Federated Learning over Vertically and Horizontally Partitioned Data for Financial Anomaly Detection

Swanand Ravindra Kadhe, Heiko Ludwig, Nathalie Baracaldo et al.

The effective detection of evidence of financial anomalies requires collaboration among multiple entities who own a diverse set of data, such as a payment network system (PNS) and its partner banks. Trust among these financial institutions is limited by regulation and competition. Federated learning (FL) enables entities to collaboratively train a model when data is either vertically or horizontally partitioned across the entities. However, in real-world financial anomaly detection scenarios, the data is partitioned both vertically and horizontally and hence it is not possible to use existing FL approaches in a plug-and-play manner. Our novel solution, PV4FAD, combines fully homomorphic encryption (HE), secure multi-party computation (SMPC), differential privacy (DP), and randomization techniques to balance privacy and accuracy during training and to prevent inference threats at model deployment time. Our solution provides input privacy through HE and SMPC, and output privacy against inference time attacks through DP. Specifically, we show that, in the honest-but-curious threat model, banks do not learn any sensitive features about PNS transactions, and the PNS does not learn any information about the banks' dataset but only learns prediction labels. We also develop and analyze a DP mechanism to protect output privacy during inference. Our solution generates high-utility models by significantly reducing the per-bank noise level while satisfying distributed DP. To ensure high accuracy, our approach produces an ensemble model, in particular, a random forest. This enables us to take advantage of the well-known properties of ensembles to reduce variance and increase accuracy. Our solution won second prize in the first phase of the U.S. Privacy Enhancing Technologies (PETs) Prize Challenge.

CRJul 7, 2022

Efficient Pruning for Machine Learning Under Homomorphic Encryption

Ehud Aharoni, Moran Baruch, Pradip Bose et al.

Privacy-preserving machine learning (PPML) solutions are gaining widespread popularity. Among these, many rely on homomorphic encryption (HE) that offers confidentiality of the model and the data, but at the cost of large latency and memory requirements. Pruning neural network (NN) parameters improves latency and memory in plaintext ML but has little impact if directly applied to HE-based PPML. We introduce a framework called HE-PEx that comprises new pruning methods, on top of a packing technique called tile tensors, for reducing the latency and memory of PPML inference. HE-PEx uses permutations to prune additional ciphertexts, and expansion to recover inference loss. We demonstrate the effectiveness of our methods for pruning fully-connected and convolutional layers in NNs on PPML tasks, namely, image compression, denoising, and classification, with autoencoders, multilayer perceptrons (MLPs) and convolutional neural networks (CNNs). We implement and deploy our networks atop a framework called HElayers, which shows a 10-35% improvement in inference speed and a 17-35% decrease in memory requirement over the unpruned network, corresponding to 33-65% fewer ciphertexts, within a 2.5% degradation in inference accuracy over the unpruned network. Compared to the state-of-the-art pruning technique for PPML, our techniques generate networks with 70% fewer ciphertexts, on average, for the same degradation limit.

DSJan 22, 2018Code

Secure $k$-ish Nearest Neighbors Classifier

Hayim Shaul, Dan Feldman, Daniela Rus

In machine learning, classifiers are used to predict a class of a given query based on an existing (classified) database. Given a database S of n d-dimensional points and a d-dimensional query q, the k-nearest neighbors (kNN) classifier assigns q with the majority class of its k nearest neighbors in S. In the secure version of kNN, S and q are owned by two different parties that do not want to share their data. Unfortunately, all known solutions for secure kNN either require a large communication complexity between the parties, or are very inefficient to run. In this work we present a classifier based on kNN, that can be implemented efficiently with homomorphic encryption (HE). The efficiency of our classifier comes from a relaxation we make on kNN, where we allow it to consider kappa nearest neighbors for kappa ~ k with some probability. We therefore call our classifier k-ish Nearest Neighbors (k-ish NN). The success probability of our solution depends on the distribution of the distances from q to S and increase as its statistical distance to Gaussian decrease. To implement our classifier we introduce the concept of double-blinded coin-toss. In a doubly-blinded coin-toss the success probability as well as the output of the toss are encrypted. We use this coin-toss to efficiently approximate the average and variance of the distances from q to S. We believe these two techniques may be of independent interest. When implemented with HE, the k-ish NN has a circuit depth that is independent of n, therefore making it scalable. We also implemented our classifier in an open source library based on HELib and tested it on a breast tumor database. The accuracy of our classifier (F_1 score) were 98\% and classification took less than 3 hours compared to (estimated) weeks in current HE implementations.

CRAug 19, 2017Code

Secure Search on the Cloud via Coresets and Sketches

Adi Akavia, Dan Feldman, Hayim Shaul

\emph{Secure Search} is the problem of retrieving from a database table (or any unsorted array) the records matching specified attributes, as in SQL SELECT queries, but where the database and the query are encrypted. Secure search has been the leading example for practical applications of Fully Homomorphic Encryption (FHE) starting in Gentry's seminal work; however, to the best of our knowledge all state-of-the-art secure search algorithms to date are realized by a polynomial of degree $Ω(m)$ for $m$ the number of records, which is typically too slow in practice even for moderate size $m$. In this work we present the first algorithm for secure search that is realized by a polynomial of degree polynomial in $\log m$. We implemented our algorithm in an open source library based on HELib implementation for the Brakerski-Gentry-Vaikuntanthan's FHE scheme, and ran experiments on Amazon's EC2 cloud. Our experiments show that we can retrieve the first match in a database of millions of entries in less than an hour using a single machine; the time reduced almost linearly with the number of machines. Our result utilizes a new paradigm of employing coresets and sketches, which are modern data summarization techniques common in computational geometry and machine learning, for efficiency enhancement for homomorphic encryption. As a central tool we design a novel sketch that returns the first positive entry in a (not necessarily sparse) array; this sketch may be of independent interest.

CRNov 3, 2020

HeLayers: A Tile Tensors Framework for Large Neural Networks on Encrypted Data

Ehud Aharoni, Allon Adir, Moran Baruch et al.

Privacy-preserving solutions enable companies to offload confidential data to third-party services while fulfilling their government regulations. To accomplish this, they leverage various cryptographic techniques such as Homomorphic Encryption (HE), which allows performing computation on encrypted data. Most HE schemes work in a SIMD fashion, and the data packing method can dramatically affect the running time and memory costs. Finding a packing method that leads to an optimal performant implementation is a hard task. We present a simple and intuitive framework that abstracts the packing decision for the user. We explain its underlying data structures and optimizer, and propose a novel algorithm for performing 2D convolution operations. We used this framework to implement an HE-friendly version of AlexNet, which runs in three minutes, several orders of magnitude faster than other state-of-the-art solutions that only use HE.