CROct 13, 2022
ScionFL: Efficient and Robust Secure Quantized AggregationYaniv Ben-Itzhak, Helen Möllering, Benny Pinkas et al.
Secure aggregation is commonly used in federated learning (FL) to alleviate privacy concerns related to the central aggregator seeing all parameter updates in the clear. Unfortunately, most existing secure aggregation schemes ignore two critical orthogonal research directions that aim to (i) significantly reduce client-server communication and (ii) mitigate the impact of malicious clients. However, both of these additional properties are essential to facilitate cross-device FL with thousands or even millions of (mobile) participants. In this paper, we unite both research directions by introducing ScionFL, the first secure aggregation framework for FL that operates efficiently on quantized inputs and simultaneously provides robustness against malicious clients. Our framework leverages (novel) multi-party computation (MPC) techniques and supports multiple linear (1-bit) quantization schemes, including ones that utilize the randomized Hadamard transform and Kashin's representation. Our theoretical results are supported by extensive evaluations. We show that with no overhead for clients and moderate overhead for the server compared to transferring and processing quantized updates in plaintext, we obtain comparable accuracy for standard FL benchmarks. Moreover, we demonstrate the robustness of our framework against state-of-the-art poisoning attacks.
CRMay 30
One (Thread) Can Keep a (PRNG) Secret, but not TwoEhood Porat, Amit Klein, Benny Pinkas
We present a novel, practical attack on the IPv6 Fragment ID generation algorithm of XNU, which is the kernel used by Apple products such as macOS and iOS. This attack exploits a race-condition vulnerability in the algorithm's pseudorandom number generator (PRNG) to cryptanalytically break, learn the internal state of the generator, and consequently predict fragment IDs, which, in turn, facilitates an IPv6 fragment spoofing attack. As far as we know, this is the first cryptanalytic attack that is based on exploiting race-conditions. With fragment spoofing, it is possible to partially manipulate UDP datagrams and TCP segments. We showcase a new type of attack on NFS (UDP) where an off-path attacker modifies a file as it is written, and an attack on HTTP (TCP) where an off-path attacker modifies an HTTP request. Apple assigned this vulnerability the CVE identifier CVE-2024-27823 and patched all its XNU-based products against the attack.
AISep 3, 2020
Fairness in the Eyes of the Data: Certifying Machine-Learning ModelsShahar Segal, Yossi Adi, Benny Pinkas et al.
We present a framework that allows to certify the fairness degree of a model based on an interactive and privacy-preserving test. The framework verifies any trained model, regardless of its training process and architecture. Thus, it allows us to evaluate any deep learning model on multiple fairness definitions empirically. We tackle two scenarios, where either the test data is privately available only to the tester or is publicly known in advance, even to the model creator. We investigate the soundness of the proposed approach using theoretical analysis and present statistical guarantees for the interactive test. Finally, we provide a cryptographic technique to automate fairness testing and certified inference with only black-box access to the model at hand while hiding the participants' sensitive data.
CRJun 25, 2019
From IP ID to Device ID and KASLR Bypass (Extended Version)Amit Klein, Benny Pinkas
IP headers include a 16-bit ID field. Our work examines the generation of this field in Windows (versions 8 and higher), Linux and Android, and shows that the IP ID field enables remote servers to assign a unique ID to each device and thus be able to identify subsequent transmissions sent from that device. This identification works across all browsers and over network changes. In modern Linux and Android versions, this field leaks a kernel address, thus we also break KASLR. Our work includes reverse-engineering of the Windows IP ID generation code, and a cryptanalysis of this code and of the Linux kernel IP ID generation code. It provides practical techniques to partially extract the key used by each of these algorithms, overcoming different implementation issues, and observing that this key can identify individual devices. We deployed a demo (for Windows) showing that key extraction and machine fingerprinting works in the wild, and tested it from networks around the world.
LGFeb 13, 2018
Turning Your Weakness Into a Strength: Watermarking Deep Neural Networks by BackdooringYossi Adi, Carsten Baum, Moustapha Cisse et al.
Deep Neural Networks have recently gained lots of success after enabling several breakthroughs in notoriously challenging problems. Training these networks is computationally expensive and requires vast amounts of training data. Selling such pre-trained models can, therefore, be a lucrative business model. Unfortunately, once the models are sold they can be easily copied and redistributed. To avoid this, a tracking mechanism to identify models as the intellectual property of a particular vendor is necessary. In this work, we present an approach for watermarking Deep Neural Networks in a black-box way. Our scheme works for general classification tasks and can easily be combined with current learning algorithms. We show experimentally that such a watermark has no noticeable impact on the primary task that the model is designed for and evaluate the robustness of our proposal against a multitude of practical attacks. Moreover, we provide a theoretical analysis, relating our approach to previous work on backdooring.
LGFeb 13, 2018
Deceiving End-to-End Deep Learning Malware Detectors using Adversarial ExamplesFelix Kreuk, Assi Barak, Shir Aviv-Reuven et al.
In recent years, deep learning has shown performance breakthroughs in many applications, such as image detection, image segmentation, pose estimation, and speech recognition. However, this comes with a major concern: deep networks have been found to be vulnerable to adversarial examples. Adversarial examples are slightly modified inputs that are intentionally designed to cause a misclassification by the model. In the domains of images and speech, the modifications are so small that they are not seen or heard by humans, but nevertheless greatly affect the classification of the model. Deep learning models have been successfully applied to malware detection. In this domain, generating adversarial examples is not straightforward, as small modifications to the bytes of the file could lead to significant changes in its functionality and validity. We introduce a novel loss function for generating adversarial examples specifically tailored for discrete input sets, such as executable bytes. We modify malicious binaries so that they would be detected as benign, while preserving their original functionality, by injecting a small sequence of bytes (payload) in the binary file. We applied this approach to an end-to-end convolutional deep learning malware detection model and show a high rate of detection evasion. Moreover, we show that our generated payload is robust enough to be transferable within different locations of the same file and across different files, and that its entropy is low and similar to that of benign data sections.
CRJun 6, 2016
The Circle Game: Scalable Private Membership Test Using Trusted HardwareSandeep Tamrakar, Jian Liu, Andrew Paverd et al.
Malware checking is changing from being a local service to a cloud-assisted one where users' devices query a cloud server, which hosts a dictionary of malware signatures, to check if particular applications are potentially malware. Whilst such an architecture gains all the benefits of cloud-based services, it opens up a major privacy concern since the cloud service can infer personal traits of the users based on the lists of applications queried by their devices. Private membership test (PMT) schemes can remove this privacy concern. However, known PMT schemes do not scale well to a large number of simultaneous users and high query arrival rates. We propose a simple PMT approach using a carousel: circling the entire dictionary through trusted hardware on the cloud server. Users communicate with the trusted hardware via secure channels. We show how the carousel approach, using different data structures to represent the dictionary, can be realized on two different commercial hardware security architectures (ARM TrustZone and Intel SGX). We highlight subtle aspects of securely implementing seemingly simple PMT schemes on these architectures. Through extensive experimental analysis, we show that for the malware checking scenario our carousel approach surprisingly outperforms Path ORAM on the same hardware by supporting a much higher query arrival rate while guaranteeing acceptable response latency for individual queries.