CodedPrivateML: A Fast and Privacy-Preserving Framework for Distributed Machine Learning
This addresses the need for fast and scalable privacy-preserving machine learning, particularly for distributed settings, though it is incremental as it builds on existing privacy and parallelization techniques.
The paper tackles the problem of training machine learning models with data privacy by introducing CodedPrivateML, a framework that ensures information-theoretic privacy for both data and model while enabling efficient distributed training, and demonstrates significant speedup over cryptographic multi-party computing approaches in experiments on Amazon EC2.
How to train a machine learning model while keeping the data private and secure? We present CodedPrivateML, a fast and scalable approach to this critical problem. CodedPrivateML keeps both the data and the model information-theoretically private, while allowing efficient parallelization of training across distributed workers. We characterize CodedPrivateML's privacy threshold and prove its convergence for logistic (and linear) regression. Furthermore, via extensive experiments on Amazon EC2, we demonstrate that CodedPrivateML provides significant speedup over cryptographic approaches based on multi-party computing (MPC).