AINov 6, 2025

ArchPilot: A Proxy-Guided Multi-Agent Approach for Machine Learning Engineering

Zhuowen Yuan, Tao Liu, Yang Yang, Yang Wang, Feng Qi, Kaushik Rangadurai, Bo Li, Shuang Yang

arXiv:2511.03985v13.3h-index: 5

Originality Highly original

AI Analysis

This work addresses the problem of high computational costs and slow iteration cycles in automated ML engineering for researchers and practitioners, representing an incremental improvement through a novel multi-agent framework.

The paper tackles the computational inefficiency of LLM-based agents in automated ML engineering by introducing ArchPilot, a multi-agent system that uses proxy-based evaluation and adaptive search to reduce reliance on full training runs, achieving superior performance on MLE-Bench compared to SOTA baselines.

Recent LLM-based agents have demonstrated strong capabilities in automated ML engineering. However, they heavily rely on repeated full training runs to evaluate candidate solutions, resulting in significant computational overhead, limited scalability to large search spaces, and slow iteration cycles. To address these challenges, we introduce ArchPilot, a multi-agent system that integrates architecture generation, proxy-based evaluation, and adaptive search into a unified framework. ArchPilot consists of three specialized agents: an orchestration agent that coordinates the search process using a Monte Carlo Tree Search (MCTS)-inspired novel algorithm with a restart mechanism and manages memory of previous candidates; a generation agent that iteratively generates, improves, and debugs candidate architectures; and an evaluation agent that executes proxy training runs, generates and optimizes proxy functions, and aggregates the proxy scores into a fidelity-aware performance metric. This multi-agent collaboration allows ArchPilot to prioritize high-potential candidates with minimal reliance on expensive full training runs, facilitating efficient ML engineering under limited budgets. Experiments on MLE-Bench demonstrate that ArchPilot outperforms SOTA baselines such as AIDE and ML-Master, validating the effectiveness of our multi-agent system.

View on arXiv PDF

Similar