CR LGAug 30, 2023

Split Without a Leak: Reducing Privacy Leakage in Split Learning

Khoa Nguyen, Tanveer Khan, Antonis Michalas

arXiv:2308.15783v111.414 citationsh-index: 22Has Code

Originality Incremental advance

AI Analysis

This work addresses privacy risks for sensitive data in deep learning, particularly in collaborative settings, though it is incremental as it builds on existing split learning and encryption techniques.

The paper tackles privacy leakage in split learning by proposing a hybrid approach combining split learning with homomorphic encryption, where the client encrypts activation maps before sending them to the server, preventing data reconstruction. On the MIT-BIH dataset, this method achieves about 6 times faster training and 160 times reduced communication overhead compared to other HE-based approaches.

The popularity of Deep Learning (DL) makes the privacy of sensitive data more imperative than ever. As a result, various privacy-preserving techniques have been implemented to preserve user data privacy in DL. Among various privacy-preserving techniques, collaborative learning techniques, such as Split Learning (SL) have been utilized to accelerate the learning and prediction process. Initially, SL was considered a promising approach to data privacy. However, subsequent research has demonstrated that SL is susceptible to many types of attacks and, therefore, it cannot serve as a privacy-preserving technique. Meanwhile, countermeasures using a combination of SL and encryption have also been introduced to achieve privacy-preserving deep learning. In this work, we propose a hybrid approach using SL and Homomorphic Encryption (HE). The idea behind it is that the client encrypts the activation map (the output of the split layer between the client and the server) before sending it to the server. Hence, during both forward and backward propagation, the server cannot reconstruct the client's input data from the intermediate activation map. This improvement is important as it reduces privacy leakage compared to other SL-based works, where the server can gain valuable information about the client's input. In addition, on the MIT-BIH dataset, our proposed hybrid approach using SL and HE yields faster training time (about 6 times) and significantly reduced communication overhead (almost 160 times) compared to other HE-based approaches, thereby offering improved privacy protection for sensitive data in DL.

View on arXiv PDF Code

Similar