CR AIMar 6, 2024

Wildest Dreams: Reproducible Research in Privacy-preserving Neural Network Training

Tanveer Khan, Mindaugas Budzys, Khoa Nguyen, Antonis Michalas

arXiv:2403.03592v15.83 citationsh-index: 20Has Code

Originality Synthesis-oriented

AI Analysis

It addresses the challenge of applying privacy-preserving techniques in real-world scenarios for data scientists and researchers handling sensitive data, though it is incremental as it synthesizes and evaluates existing methods.

This work examines the gap between theoretical research and practical applications in Privacy-Preserving Machine Learning (PPML), focusing on Homomorphic Encryption and Secure Multi-party Computation for model training, and provides a systematic comparison of recent frameworks and their reproducibility.

Machine Learning (ML), addresses a multitude of complex issues in multiple disciplines, including social sciences, finance, and medical research. ML models require substantial computing power and are only as powerful as the data utilized. Due to high computational cost of ML methods, data scientists frequently use Machine Learning-as-a-Service (MLaaS) to outsource computation to external servers. However, when working with private information, like financial data or health records, outsourcing the computation might result in privacy issues. Recent advances in Privacy-Preserving Techniques (PPTs) have enabled ML training and inference over protected data through the use of Privacy-Preserving Machine Learning (PPML). However, these techniques are still at a preliminary stage and their application in real-world situations is demanding. In order to comprehend discrepancy between theoretical research suggestions and actual applications, this work examines the past and present of PPML, focusing on Homomorphic Encryption (HE) and Secure Multi-party Computation (SMPC) applied to ML. This work primarily focuses on the ML model's training phase, where maintaining user data privacy is of utmost importance. We provide a solid theoretical background that eases the understanding of current approaches and their limitations. In addition, we present a SoK of the most recent PPML frameworks for model training and provide a comprehensive comparison in terms of the unique properties and performances on standard benchmarks. Also, we reproduce the results for some of the papers and examine at what level existing works in the field provide support for open science. We believe our work serves as a valuable contribution by raising awareness about the current gap between theoretical advancements and real-world applications in PPML, specifically regarding open-source availability, reproducibility, and usability.

View on arXiv PDF

Similar