LGCRApr 6, 2024

Hyperparameter Optimization for SecureBoost via Constrained Multi-Objective Federated Learning

arXiv:2404.04490v12 citationsh-index: 20
Originality Incremental advance
AI Analysis

This addresses the suboptimal trade-off between utility, privacy, and efficiency in SecureBoost for fields like finance and healthcare, though it is incremental as it builds on existing SecureBoost methods.

The paper tackles the vulnerability of SecureBoost to label leakage in vertical federated learning, proposing a constrained multi-objective optimization algorithm (CMOSB) that yields superior hyperparameters for balancing utility loss, training cost, and privacy leakage compared to grid search and Bayesian optimization.

SecureBoost is a tree-boosting algorithm that leverages homomorphic encryption (HE) to protect data privacy in vertical federated learning. SecureBoost and its variants have been widely adopted in fields such as finance and healthcare. However, the hyperparameters of SecureBoost are typically configured heuristically for optimizing model performance (i.e., utility) solely, assuming that privacy is secured. Our study found that SecureBoost and some of its variants are still vulnerable to label leakage. This vulnerability may lead the current heuristic hyperparameter configuration of SecureBoost to a suboptimal trade-off between utility, privacy, and efficiency, which are pivotal elements toward a trustworthy federated learning system. To address this issue, we propose the Constrained Multi-Objective SecureBoost (CMOSB) algorithm, which aims to approximate Pareto optimal solutions that each solution is a set of hyperparameters achieving an optimal trade-off between utility loss, training cost, and privacy leakage. We design measurements of the three objectives, including a novel label inference attack named instance clustering attack (ICA) to measure the privacy leakage of SecureBoost. Additionally, we provide two countermeasures against ICA. The experimental results demonstrate that the CMOSB yields superior hyperparameters over those optimized by grid search and Bayesian optimization regarding the trade-off between utility loss, training cost, and privacy leakage.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes