S3ML: A Secure Serving System for Machine Learning Inference
This work addresses privacy concerns in ML serving systems for users and organizations, representing an incremental improvement by integrating SGX with existing infrastructure like Kubernetes.
The paper tackles the problem of securing machine learning inference by introducing S3ML, a system that uses Intel SGX enclaves to protect user privacy, achieving low-overhead, high-availability, and scalability as demonstrated through experiments on widely-used models.
We present S3ML, a secure serving system for machine learning inference in this paper. S3ML runs machine learning models in Intel SGX enclaves to protect users' privacy. S3ML designs a secure key management service to construct flexible privacy-preserving server clusters and proposes novel SGX-aware load balancing and scaling methods to satisfy users' Service-Level Objectives. We have implemented S3ML based on Kubernetes as a low-overhead, high-available, and scalable system. We demonstrate the system performance and effectiveness of S3ML through extensive experiments on a series of widely-used models.