Fabian Boemer

h-index13

7papers

685citations

Novelty43%

AI Score28

Ranked #149,616 of 194,257 authors (top 77%)#3,971 in CR (top 59%)

7 Papers

17.4CRJul 10Code

Wally: Batched Private Nearest Neighbor Search at Scale

Hilal Asi, Fabian Boemer, Nicholas Genise et al. · apple-ml

We present Wally, a batched private nearest-neighbor search protocol that uses differential privacy to break the linear computation barrier of fully-oblivious schemes. In Tiptoe, the server must process the entire database per query to hide the access pattern, resulting in low throughput (909 QPS) and high communication (17.4 MB) on a 3.2M-entry database. Sublinear alternatives like Pacmann require 614 MB of client storage and an offline streaming phase. Wally's key insight is that fully-oblivious schemes are prohibitively expensive at scale, but the same scale also provides an opportunity. Large-scale systems naturally have many concurrent clients. Wally batches queries from non-coordinating clients, each independently adding fake queries to hide which clusters it accesses. The fake query counts follow a negative binomial distribution, which is non-negative and infinitely divisible, allowing independent sampling without coordination. Clients send queries at random times through an existing anonymization service, avoiding a centralized shuffler. The server sees only an anonymized, noisy stream of cluster accesses that is provably (epsilon, delta)-differentially private, computing over only the relevant clusters. The client encrypts its query under SHE so the server returns only encrypted similarity scores. On a 3.2M-entry database with 500K-query batches, Wally achieves 7-29x higher throughput and 6.7-31x lower communication than Tiptoe, and 15,000x lower client storage than Pacmann, with strong (epsilon=0.1, delta=2^{-26})-DP and comparable accuracy. We also propose optimizations to SHE and keyword PIR yielding 2-3x improvements in PIR and 20-25% in BFV operations, and release an open-source BFV library in Swift.

21.0CRMar 30, 2021Code

Intel HEXL: Accelerating Homomorphic Encryption with Intel AVX512-IFMA52

Fabian Boemer, Sejun Kim, Gelila Seifu et al.

Modern implementations of homomorphic encryption (HE) rely heavily on polynomial arithmetic over a finite field. This is particularly true of the CKKS, BFV, and BGV HE schemes. Two of the biggest performance bottlenecks in HE primitives and applications are polynomial modular multiplication and the forward and inverse number-theoretic transform (NTT). Here, we introduce Intel Homomorphic Encryption Acceleration Library (Intel HEXL), a C++ library which provides optimized implementations of polynomial arithmetic for Intel processors. Intel HEXL takes advantage of the recent Intel Advanced Vector Extensions 512 (Intel AVX512) instruction set to provide state-of-the-art implementations of the NTT and modular multiplication. On the forward and inverse NTT, Intel HEXL provides up to 7.2x and 6.7x speedup, respectively, over a native C++ implementation. Intel HEXL also provides up to 6.0x speedup on the element-wise vector-vector modular multiplication, and 1.7x speedup on the element-wise vector-scalar modular multiplication. Intel HEXL is available open-source at https://github.com/intel/hexl under the Apache 2.0 license and has been adopted by the Microsoft SEAL and PALISADE homomorphic encryption libraries.

8.8CRSep 29, 2021

Accelerating Encrypted Computing on Intel GPUs

Yujia Zhai, Mohannad Ibrahim, Yiqin Qiu et al.

Homomorphic Encryption (HE) is an emerging encryption scheme that allows computations to be performed directly on encrypted messages. This property provides promising applications such as privacy-preserving deep learning and cloud computing. Prior works have been proposed to enable practical privacy-preserving applications with architectural-aware optimizations on CPUs, GPUs and FPGAs. However, there is no systematic optimization for the whole HE pipeline on Intel GPUs. In this paper, we present the first-ever SYCL-based GPU backend for Microsoft SEAL APIs. We perform optimizations from instruction level, algorithmic level and application level to accelerate our HE library based on the Cheon, Kim, Kimand Song (CKKS) scheme on Intel GPUs. The performance is validated on two latest Intel GPUs. Experimental results show that our staged optimizations together with optimizations including low-level optimizations and kernel fusion accelerate the Number Theoretic Transform (NTT), a key algorithm for HE, by up to 9.93X compared with the naïve GPU baseline. The roofline analysis confirms that our optimized NTT reaches 79.8% and85.7% of the peak performance on two GPU devices. Through the highly optimized NTT and the assembly-level optimization, we obtain 2.32X - 3.05X acceleration for HE evaluation routines. In addition, our all-together systematic optimizations improve the performance of encrypted element-wise polynomial matrix multiplication application by up to 3.10X.

16.0CRMar 30, 2021

Enabling Homomorphically Encrypted Inference for Large DNN Models

Guillermo Lloret-Talavera, Marc Jorda, Harald Servat et al.

The proliferation of machine learning services in the last few years has raised data privacy concerns. Homomorphic encryption (HE) enables inference using encrypted data but it incurs 100x-10,000x memory and runtime overheads. Secure deep neural network (DNN) inference using HE is currently limited by computing and memory resources, with frameworks requiring hundreds of gigabytes of DRAM to evaluate small models. To overcome these limitations, in this paper we explore the feasibility of leveraging hybrid memory systems comprised of DRAM and persistent memory. In particular, we explore the recently-released Intel Optane PMem technology and the Intel HE-Transformer nGraph to run large neural networks such as MobileNetV2 (in its largest variant) and ResNet-50 for the first time in the literature. We present an in-depth analysis of the efficiency of the executions with different hardware and software configurations. Our results conclude that DNN inference using HE incurs on friendly access patterns for this memory configuration, yielding efficient executions.

13.6CRAug 10, 2020

Trustworthy AI Inference Systems: An Industry Research View

Rosario Cammarota, Matthias Schunter, Anand Rajan et al.

In this work, we provide an industry research view for approaching the design, deployment, and operation of trustworthy Artificial Intelligence (AI) inference systems. Such systems provide customers with timely, informed, and customized inferences to aid their decision, while at the same time utilizing appropriate security protection mechanisms for AI models. Additionally, such systems should also use Privacy-Enhancing Technologies (PETs) to protect customers' data at any time. To approach the subject, we start by introducing current trends in AI inference systems. We continue by elaborating on the relationship between Intellectual Property (IP) and private data protection in such systems. Regarding the protection mechanisms, we survey the security and privacy building blocks instrumental in designing, building, deploying, and operating private AI inference systems. For example, we highlight opportunities and challenges in AI systems using trusted execution environments combined with more recent advances in cryptographic techniques to protect data in use. Finally, we outline areas of further development that require the global collective attention of industry, academia, and government researchers to sustain the operation of trustworthy AI inference systems.

29.4CRAug 12, 2019

nGraph-HE2: A High-Throughput Framework for Neural Network Inference on Encrypted Data

Fabian Boemer, Anamaria Costache, Rosario Cammarota et al.

In previous work, Boemer et al. introduced nGraph-HE, an extension to the Intel nGraph deep learning (DL) compiler, that enables data scientists to deploy models with popular frameworks such as TensorFlow and PyTorch with minimal code changes. However, the class of supported models was limited to relatively shallow networks with polynomial activations. Here, we introduce nGraph-HE2, which extends nGraph-HE to enable privacy-preserving inference on standard, pre-trained models using their native activation functions and number fields (typically real numbers). The proposed framework leverages the CKKS scheme, whose support for real numbers is friendly to data science, and a client-aided model using a two-party approach to compute activation functions. We first present CKKS-specific optimizations, enabling a 3x-88x runtime speedup for scalar encoding, and doubling the throughput through a novel use of CKKS plaintext packing into complex numbers. Second, we optimize ciphertext-plaintext addition and multiplication, yielding 2.6x-4.2x runtime speedup. Third, we exploit two graph-level optimizations: lazy rescaling and depth-aware encoding, which allow us to significantly improve performance. Together, these optimizations enable state-of-the-art throughput of 1,998 images/s on the CryptoNets network. Using the client-aided model, we also present homomorphic evaluation of (to our knowledge) the largest network to date, namely, pre-trained MobileNetV2 models on the ImageNet dataset, with 60.4\percent/82.7\percent\ top-1/top-5 accuracy and an amortized runtime of 381 ms/image.

29.6CROct 23, 2018Code

nGraph-HE: A Graph Compiler for Deep Learning on Homomorphically Encrypted Data

Fabian Boemer, Yixing Lao, Rosario Cammarota et al.

Homomorphic encryption (HE)---the ability to perform computation on encrypted data---is an attractive remedy to increasing concerns about data privacy in deep learning (DL). However, building DL models that operate on ciphertext is currently labor-intensive and requires simultaneous expertise in DL, cryptography, and software engineering. DL frameworks and recent advances in graph compilers have greatly accelerated the training and deployment of DL models to various computing platforms. We introduce nGraph-HE, an extension of nGraph, Intel's DL graph compiler, which enables deployment of trained models with popular frameworks such as TensorFlow while simply treating HE as another hardware target. Our graph-compiler approach enables HE-aware optimizations-- implemented at compile-time, such as constant folding and HE-SIMD packing, and at run-time, such as special value plaintext bypass. Furthermore, nGraph-HE integrates with DL frameworks such as TensorFlow, enabling data scientists to benchmark DL models with minimal overhead.