MLLGApr 29

SCOPE-FE: Structured Control of Operator and Pairwise Exploration for Feature Engineering

arXiv:2604.2702514.6
Predicted impact top 77% in ML · last 90 daysOriginality Incremental advance
AI Analysis

For practitioners of tabular learning, this work addresses the scalability bottleneck of expand-and-reduce feature engineering methods, offering a more efficient approach for high-dimensional data.

SCOPE-FE reduces the computational cost of automatic feature engineering by pruning the operator and feature-pair spaces before candidate generation, achieving substantial time savings on high-dimensional datasets while maintaining predictive performance comparable to existing methods.

Automatic feature engineering is an effective approach for improving predictive performance in tabular learning. However, expand-and-reduce methods, such as OpenFE, become increasingly computationally expensive as the input dimensionality grows. This limitation arises primarily from the combinatorial explosion of candidate features generated through operator-feature combinations. To address this issue, we propose SCOPE-FE, a structured search space control framework that improves efficiency by reducing the candidate space prior to feature generation. SCOPE-FE jointly regulates two major sources of combinatorial growth: the operator space and feature-pair space. First, OperatorProbing estimates the dataset-specific utility of candidate operators and eliminates low-contribution operators in advance. Second, FeatureClustering employs spectral embedding and fuzzy c-means clustering to group structurally related features, thereby restricting candidate generation to relevant within-cluster combinations. In addition, we introduce ReliabilityScoring, which incorporates variance across subsamples to stabilize pruning decisions. Experiments on ten benchmark datasets demonstrate that SCOPE-FE substantially reduces feature engineering time while maintaining competitive predictive performance relative to existing baselines. The efficiency gains are particularly pronounced for high-dimensional datasets. These results indicate that structured control of the search space is an effective strategy for scalable automatic feature engineering. The code will be made publicly available upon acceptance.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes