ReFuGe: Feature Generation for Prediction Tasks on Relational Databases with LLM Agents
This addresses the challenge of enhancing predictive performance in relational databases for applications like web data management, though it appears incremental as it builds on existing methods with a novel agent-based approach.
The paper tackles the problem of generating informative relational features for prediction tasks on relational databases, proposing ReFuGe, an agentic framework with specialized LLM agents, and demonstrates substantial performance improvements on RDB benchmarks.
Relational databases (RDBs) play a crucial role in many real-world web applications, supporting data management across multiple interconnected tables. Beyond typical retrieval-oriented tasks, prediction tasks on RDBs have recently gained attention. In this work, we address this problem by generating informative relational features that enhance predictive performance. However, generating such features is challenging: it requires reasoning over complex schemas and exploring a combinatorially large feature space, all without explicit supervision. To address these challenges, we propose ReFuGe, an agentic framework that leverages specialized large language model agents: (1) a schema selection agent identifies the tables and columns relevant to the task, (2) a feature generation agent produces diverse candidate features from the selected schema, and (3) a feature filtering agent evaluates and retains promising features through reasoning-based and validation-based filtering. It operates within an iterative feedback loop until performance converges. Experiments on RDB benchmarks demonstrate that ReFuGe substantially improves performance on various RDB prediction tasks. Our code and datasets are available at https://github.com/K-Kyungho/REFUGE.