LG CR OCMay 25, 2023

Learning across Data Owners with Joint Differential Privacy

Yangsibo Huang, Haotian Jiang, Daogao Liu, Mohammad Mahdian, Jieming Mao, Vahab Mirrokni

arXiv:2305.15723v13.81 citationsh-index: 53

Originality Incremental advance

AI Analysis

This work addresses privacy-preserving model training for multiple data owners, building incrementally on prior research focused on linear regressions.

The paper tackles collaborative machine learning under joint differential privacy for stochastic convex optimization, presenting a DP-SGD variant with theoretical population loss bounds and empirical validation on multi-class classification datasets.

In this paper, we study the setting in which data owners train machine learning models collaboratively under a privacy notion called joint differential privacy [Kearns et al., 2018]. In this setting, the model trained for each data owner $j$ uses $j$'s data without privacy consideration and other owners' data with differential privacy guarantees. This setting was initiated in [Jain et al., 2021] with a focus on linear regressions. In this paper, we study this setting for stochastic convex optimization (SCO). We present an algorithm that is a variant of DP-SGD [Song et al., 2013; Abadi et al., 2016] and provides theoretical bounds on its population loss. We compare our algorithm to several baselines and discuss for what parameter setups our algorithm is more preferred. We also empirically study joint differential privacy in the multi-class classification problem over two public datasets. Our empirical findings are well-connected to the insights from our theoretical results.

View on arXiv PDF

Similar