PIVODL: Privacy-preserving vertical federated learning over distributed labels
This addresses a privacy challenge in federated learning for real-world applications where labels are distributed, but it is incremental as it adapts existing methods to a more realistic scenario.
The paper tackles the problem of training gradient boosting decision trees in vertical federated learning when data labels are distributed across multiple clients, which previous studies assumed were centralized on one client, and shows that the proposed PIVODL framework achieves negligible information leakage and model performance degradation.
Federated learning (FL) is an emerging privacy preserving machine learning protocol that allows multiple devices to collaboratively train a shared global model without revealing their private local data. Non-parametric models like gradient boosting decision trees (GBDT) have been commonly used in FL for vertically partitioned data. However, all these studies assume that all the data labels are stored on only one client, which may be unrealistic for real-world applications. Therefore, in this work, we propose a secure vertical FL framework, named PIVODL, to train GBDT with data labels distributed on multiple devices. Both homomorphic encryption and differential privacy are adopted to prevent label information from being leaked through transmitted gradients and leaf values. Our experimental results show that both information leakage and model performance degradation of the proposed PIVODL are negligible.