AICLLGMAMar 30, 2025

SPIO: Ensemble and Selective Strategies via LLM-Based Multi-Agent Planning in Automated Data Science

arXiv:2503.23314v18 citationsh-index: 3
Originality Highly original
AI Analysis

This work addresses the need for more flexible and optimal automated data science pipelines, offering a scalable solution for data analysts and practitioners.

The paper tackles the problem of rigid, single-path workflows in automated data science by proposing SPIO, a framework that uses LLM-driven multi-agent planning to explore diverse strategies across data preprocessing, feature engineering, modeling, and hyperparameter tuning, with experiments showing it significantly outperforms state-of-the-art methods on Kaggle and OpenML datasets.

Large Language Models (LLMs) have revolutionized automated data analytics and machine learning by enabling dynamic reasoning and adaptability. While recent approaches have advanced multi-stage pipelines through multi-agent systems, they typically rely on rigid, single-path workflows that limit the exploration and integration of diverse strategies, often resulting in suboptimal predictions. To address these challenges, we propose SPIO (Sequential Plan Integration and Optimization), a novel framework that leverages LLM-driven decision-making to orchestrate multi-agent planning across four key modules: data preprocessing, feature engineering, modeling, and hyperparameter tuning. In each module, dedicated planning agents independently generate candidate strategies that cascade into subsequent stages, fostering comprehensive exploration. A plan optimization agent refines these strategies by suggesting several optimized plans. We further introduce two variants: SPIO-S, which selects a single best solution path as determined by the LLM, and SPIO-E, which selects the top k candidate plans and ensembles them to maximize predictive performance. Extensive experiments on Kaggle and OpenML datasets demonstrate that SPIO significantly outperforms state-of-the-art methods, providing a robust and scalable solution for automated data science task.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes