SecureBoost+: Large Scale and High-Performance Vertical Federated Gradient Boosting Decision Tree
This work addresses the problem of slow and unscalable privacy-preserving machine learning for data owners in vertical federated settings, representing a significant incremental improvement over existing methods.
The paper tackles the inefficiency and scalability limitations of SecureBoost, a popular vertical federated learning algorithm for gradient boosting decision trees, by proposing SecureBoost+, which achieves 6-35x faster training speeds while maintaining the same accuracy and scaling to tens of millions of data samples.
Gradient boosting decision tree (GBDT) is an ensemble machine learning algorithm, which is widely used in industry, due to its good performance and easy interpretation. Due to the problem of data isolation and the requirement of privacy, many works try to use vertical federated learning to train machine learning models collaboratively with privacy guarantees between different data owners. SecureBoost is one of the most popular vertical federated learning algorithms for GBDT. However, in order to achieve privacy preservation, SecureBoost involves complex training procedures and time-consuming cryptography operations. This causes SecureBoost to be slow to train and does not scale to large scale data. In this work, we propose SecureBoost+, a large-scale and high-performance vertical federated gradient boosting decision tree framework. SecureBoost+ is secure in the semi-honest model, which is the same as SecureBoost. SecureBoost+ can be scaled up to tens of millions of data samples easily. SecureBoost+ achieves high performance through several novel optimizations for SecureBoost, including ciphertext operation optimization, the introduction of new training mechanisms, and multi-classification training optimization. The experimental results show that SecureBoost+ is 6-35x faster than SecureBoost, but with the same accuracy and can be scaled up to tens of millions of data samples and thousands of feature dimensions.