CRLGAug 14, 2020

Privacy Preserving Vertical Federated Learning for Tree-based Models

arXiv:2008.06170v1259 citations
AI Analysis

This addresses privacy concerns for organizations collaborating on machine learning with disjoint features and centralized labels, offering a novel method for secure decision tree training.

The paper tackles the problem of training tree-based models in vertical federated learning without revealing private data, proposing Pivot, a solution that protects against semi-honest adversaries compromising up to m-1 clients and achieves efficiency with theoretical and experimental validation.

Federated learning (FL) is an emerging paradigm that enables multiple organizations to jointly train a model without revealing their private data to each other. This paper studies {\it vertical} federated learning, which tackles the scenarios where (i) collaborating organizations own data of the same set of users but with disjoint features, and (ii) only one organization holds the labels. We propose Pivot, a novel solution for privacy preserving vertical decision tree training and prediction, ensuring that no intermediate information is disclosed other than those the clients have agreed to release (i.e., the final tree model and the prediction output). Pivot does not rely on any trusted third party and provides protection against a semi-honest adversary that may compromise $m-1$ out of $m$ clients. We further identify two privacy leakages when the trained decision tree model is released in plaintext and propose an enhanced protocol to mitigate them. The proposed solution can also be extended to tree ensemble models, e.g., random forest (RF) and gradient boosting decision tree (GBDT) by treating single decision trees as building blocks. Theoretical and experimental analysis suggest that Pivot is efficient for the privacy achieved.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes