DCCRLGDec 11, 2024

Protecting Confidentiality, Privacy and Integrity in Collaborative Learning

arXiv:2412.08534v22 citationsh-index: 14
Originality Incremental advance
AI Analysis

This addresses security concerns for dataset and model owners in collaborative ML, offering a comprehensive solution that is incremental by enhancing existing techniques like differential privacy and TEEs.

The paper tackles the problem of protecting confidentiality, privacy, and integrity in collaborative machine learning by introducing Citadel++, a system that safeguards datasets, models, training code, and user privacy, achieving up to 543x speedup on CPU and 113x on GPU TEEs compared to state-of-the-art methods.

A collaboration between dataset owners and model owners is needed to facilitate effective machine learning (ML) training. During this collaboration, however, dataset owners and model owners want to protect the confidentiality of their respective assets (i.e., datasets, models and training code), with the dataset owners also caring about the privacy of individual users whose data is in their datasets. Existing solutions either provide limited confidentiality for models and training code, or suffer from privacy issues due to collusion. We present Citadel++, a collaborative ML training system designed to simultaneously protect the confidentiality of datasets, models and training code as well as the privacy of individual users. Citadel++ enhances differential privacy mechanisms to safeguard the privacy of individual user data while maintaining model utility. By employing Virtual Machine-level Trusted Execution Environments (TEEs) as well as the improved sandboxing and integrity mechanisms through OS-level techniques, Citadel++ effectively preserves the confidentiality of datasets, models and training code, and enforces our privacy mechanisms even when the models and training code have been maliciously designed. Our experiments show that Citadel++ provides model utility and performance while adhering to the confidentiality and privacy requirements of dataset owners and model owners, outperforming the state-of-the-art privacy-preserving training systems by up to 543x on CPU and 113x on GPU TEEs.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes