Hanwen Feng

LGOct 4, 2022Code

OpBoost: A Vertical Federated Tree Boosting Framework Based on Order-Preserving Desensitization

Xiaochen Li, Yuke Hu, Weiran Liu et al.

Vertical Federated Learning (FL) is a new paradigm that enables users with non-overlapping attributes of the same data samples to jointly train a model without directly sharing the raw data. Nevertheless, recent works show that it's still not sufficient to prevent privacy leakage from the training process or the trained model. This paper focuses on studying the privacy-preserving tree boosting algorithms under the vertical FL. The existing solutions based on cryptography involve heavy computation and communication overhead and are vulnerable to inference attacks. Although the solution based on Local Differential Privacy (LDP) addresses the above problems, it leads to the low accuracy of the trained model. This paper explores to improve the accuracy of the widely deployed tree boosting algorithms satisfying differential privacy under vertical FL. Specifically, we introduce a framework called OpBoost. Three order-preserving desensitization algorithms satisfying a variant of LDP called distance-based LDP (dLDP) are designed to desensitize the training data. In particular, we optimize the dLDP definition and study efficient sampling distributions to further improve the accuracy and efficiency of the proposed algorithms. The proposed algorithms provide a trade-off between the privacy of pairs with large distance and the utility of desensitized values. Comprehensive evaluations show that OpBoost has a better performance on prediction accuracy of trained models compared with existing LDP approaches on reasonable settings. Our code is open source.

CRSep 29, 2020

Privacy Enhancement via Dummy Points in the Shuffle Model

Xiaochen Li, Weiran Liu, Hanwen Feng et al.

The shuffle model is recently proposed to address the issue of severe utility loss in Local Differential Privacy (LDP) due to distributed data randomization.In the shuffle model, a shuffler is utilized to break the link between the user identity and the message uploaded to the data analyst. Since less noise needs to be introduced to achieve the same privacy guarantee, following this paradigm, the utility of privacy-preserving data collection is improved. We propose DUMP (\underline{DUM}my-\underline{P}oint-based), a framework for privacy-preserving histogram estimation in the shuffle model. The core of DUMP is a new concept of \emph{dummy blanket}, which enables enhancing privacy by just introducing \textit{points }on the user side and further improving the utility of the shuffle model.We instantiate DUMP by proposing two protocols: pureDUMP and mixDUMP, and conduct a comprehensive experimental evaluation to compare them with existing protocols. The experimental results show that, under the same privacy guarantee, (1) the proposed protocols have significant improvements in communication efficiency over all existing multi-message protocols, by at least 3 orders of magnitude; (2) they achieve competitive utility, while the only known protocol (Ghazi \textit{et al.}, PMLR 2020) having better utility than ours employs hard-to-exactly-sample distributions which are vulnerable to floating-point attacks (CCS 2012).

Hanwen Feng

2 Papers