CR LGAug 14, 2020

Privacy Preserving Vertical Federated Learning for Tree-based Models

Yuncheng Wu, Shaofeng Cai, Xiaokui Xiao, Gang Chen, Beng Chin Ooi

arXiv:2008.06170v1259 citations

AI Analysis

This addresses privacy concerns for organizations collaborating on machine learning with disjoint features and centralized labels, offering a novel method for secure decision tree training.

The paper tackles the problem of training tree-based models in vertical federated learning without revealing private data, proposing Pivot, a solution that protects against semi-honest adversaries compromising up to m-1 clients and achieves efficiency with theoretical and experimental validation.

Federated learning (FL) is an emerging paradigm that enables multiple organizations to jointly train a model without revealing their private data to each other. This paper studies {\it vertical} federated learning, which tackles the scenarios where (i) collaborating organizations own data of the same set of users but with disjoint features, and (ii) only one organization holds the labels. We propose Pivot, a novel solution for privacy preserving vertical decision tree training and prediction, ensuring that no intermediate information is disclosed other than those the clients have agreed to release (i.e., the final tree model and the prediction output). Pivot does not rely on any trusted third party and provides protection against a semi-honest adversary that may compromise $m-1$ out of $m$ clients. We further identify two privacy leakages when the trained decision tree model is released in plaintext and propose an enhanced protocol to mitigate them. The proposed solution can also be extended to tree ensemble models, e.g., random forest (RF) and gradient boosting decision tree (GBDT) by treating single decision trees as building blocks. Theoretical and experimental analysis suggest that Pivot is efficient for the privacy achieved.

View on arXiv PDF

Similar