LGNov 22, 2022

OpenFE: Automated Feature Generation with Expert-level Performance

arXiv:2211.12507v363 citationsh-index: 70Has Code
Originality Incremental advance
AI Analysis

This work addresses the labor-intensive task of manual feature generation for machine learning practitioners, offering an incremental improvement through a novel automated tool.

OpenFE tackles the challenge of automated feature generation for tabular data by introducing a tool that efficiently identifies effective features, achieving competitive results against experts and outperforming baseline methods on benchmarks, with features beating 99.3% and 99.6% of teams in Kaggle competitions.

The goal of automated feature generation is to liberate machine learning experts from the laborious task of manual feature generation, which is crucial for improving the learning performance of tabular data. The major challenge in automated feature generation is to efficiently and accurately identify effective features from a vast pool of candidate features. In this paper, we present OpenFE, an automated feature generation tool that provides competitive results against machine learning experts. OpenFE achieves high efficiency and accuracy with two components: 1) a novel feature boosting method for accurately evaluating the incremental performance of candidate features and 2) a two-stage pruning algorithm that performs feature pruning in a coarse-to-fine manner. Extensive experiments on ten benchmark datasets show that OpenFE outperforms existing baseline methods by a large margin. We further evaluate OpenFE in two Kaggle competitions with thousands of data science teams participating. In the two competitions, features generated by OpenFE with a simple baseline model can beat 99.3% and 99.6% data science teams respectively. In addition to the empirical results, we provide a theoretical perspective to show that feature generation can be beneficial in a simple yet representative setting. The code is available at https://github.com/ZhangTP1996/OpenFE.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes