AIJun 25, 2025

Tabular Feature Discovery With Reasoning Type Exploration

arXiv:2506.20357v13 citationsh-index: 12
Originality Incremental advance
AI Analysis

This addresses feature engineering challenges for tabular data in machine learning, representing an incremental improvement over existing LLM-based approaches.

The paper tackles the problem of LLM-based feature engineering for tabular data producing overly simple or repetitive features by proposing REFeat, a method that guides LLMs using multiple reasoning types to generate features. Experiments on 59 benchmark datasets show it achieves higher predictive accuracy on average and discovers more diverse and meaningful features.

Feature engineering for tabular data remains a critical yet challenging step in machine learning. Recently, large language models (LLMs) have been used to automatically generate new features by leveraging their vast knowledge. However, existing LLM-based approaches often produce overly simple or repetitive features, partly due to inherent biases in the transformations the LLM chooses and the lack of structured reasoning guidance during generation. In this paper, we propose a novel method REFeat, which guides an LLM to discover diverse and informative features by leveraging multiple types of reasoning to steer the feature generation process. Experiments on 59 benchmark datasets demonstrate that our approach not only achieves higher predictive accuracy on average, but also discovers more diverse and meaningful features. These results highlight the promise of incorporating rich reasoning paradigms and adaptive strategy selection into LLM-driven feature discovery for tabular data.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes