FedSDG-FS: Efficient and Secure Feature Selection for Vertical Federated Learning
This work addresses a practical bottleneck in vertical federated learning for data owners with overlapping samples but different features, offering an incremental improvement over prior methods by eliminating the need for prior knowledge.
The paper tackles the problem of feature selection in vertical federated learning, where existing methods require impractical prior knowledge, and proposes FedSDG-FS, which uses a Gaussian stochastic dual-gate and Gini impurity initialization to efficiently and securely select features, resulting in improved model performance and accurate feature selection as shown in experiments on synthetic and real-world datasets.
Vertical Federated Learning (VFL) enables multiple data owners, each holding a different subset of features about largely overlapping sets of data sample(s), to jointly train a useful global model. Feature selection (FS) is important to VFL. It is still an open research problem as existing FS works designed for VFL either assumes prior knowledge on the number of noisy features or prior knowledge on the post-training threshold of useful features to be selected, making them unsuitable for practical applications. To bridge this gap, we propose the Federated Stochastic Dual-Gate based Feature Selection (FedSDG-FS) approach. It consists of a Gaussian stochastic dual-gate to efficiently approximate the probability of a feature being selected, with privacy protection through Partially Homomorphic Encryption without a trusted third-party. To reduce overhead, we propose a feature importance initialization method based on Gini impurity, which can accomplish its goals with only two parameter transmissions between the server and the clients. Extensive experiments on both synthetic and real-world datasets show that FedSDG-FS significantly outperforms existing approaches in terms of achieving accurate selection of high-quality features as well as building global models with improved performance.