Federated Automated Feature Engineering
This work addresses the lack of AutoFE methods for federated learning, enabling improved predictive performance in privacy-sensitive distributed data scenarios.
The paper tackles the problem of automated feature engineering (AutoFE) in federated learning (FL) settings where data is distributed across clients without sharing, introducing algorithms for horizontal, vertical, and hybrid FL cases. It shows that the federated AutoFE algorithms achieve test scores close to centralized AutoFE performance.
Automated feature engineering (AutoFE) is used to automatically create new features from original features to improve predictive performance without needing significant human intervention and domain expertise. Many algorithms exist for AutoFE, but very few approaches exist for the federated learning (FL) setting where data is gathered across many clients and is not shared between clients or a central server. We introduce AutoFE algorithms for the horizontal, vertical, and hybrid FL settings, which differ in how the data is gathered across clients. To the best of our knowledge, we are the first to develop AutoFE algorithms for the horizontal and hybrid FL cases, and we show that the downstream test scores of our federated AutoFE algorithms is close in performance to the case where data is held centrally and AutoFE is performed centrally.