Cocoon: A System Architecture for Differentially Private Training with Correlated Noises
This work addresses efficiency bottlenecks in privacy-preserving machine learning for data owners, though it is incremental as it builds on existing correlated noise mechanisms.
The authors tackled the performance overhead of differentially private training with correlated noises, particularly for large models and embedding tables, by proposing Cocoon, a hardware-software co-designed framework that accelerates training by 2.33-10.82x for embedding tables and 1.55-3.06x for large models.
Machine learning (ML) models memorize and leak training data, causing serious privacy issues to data owners. Training algorithms with differential privacy (DP), such as DP-SGD, have been gaining attention as a solution. However, DP-SGD adds a noise at each training iteration, which degrades the accuracy of the trained model. To improve accuracy, a new family of approaches adds carefully designed correlated noises, so that noises cancel out each other across iterations. We performed an extensive characterization study of these new mechanisms, for the first time to the best of our knowledge, and show they incur non-negligible overheads when the model is large or uses large embedding tables. Motivated by the analysis, we propose Cocoon, a hardware-software co-designed framework for efficient training with correlated noises. Cocoon accelerates models with embedding tables through pre-computing and storing correlated noises in a coalesced format (Cocoon-Emb), and supports large models through a custom near-memory processing device (Cocoon-NMP). On a real system with an FPGA-based NMP device prototype, Cocoon improves the performance by 2.33-10.82x(Cocoon-Emb) and 1.55-3.06x (Cocoon-NMP).