AIMay 20, 2025

DSMentor: Enhancing Data Science Agents with Curriculum Learning and Online Knowledge Accumulation

He Wang, Alexander Hanbo Li, Yiqun Hu, Sheng Zhang, Hideo Kobayashi, Jiani Zhang, Henry Zhu, Chung-Wei Hang, Patrick Ng

arXiv:2505.14163v111.13 citationsh-index: 14

Originality Incremental advance

AI Analysis

This work addresses the challenge of enhancing inference-time strategies for data science agents, offering an incremental improvement over existing methods by optimizing task order and memory usage.

The paper tackles the problem of improving LLM agent performance in complex data science tasks by proposing DSMentor, a framework that uses curriculum learning and online knowledge accumulation during inference, resulting in pass rate improvements of up to 5.2% on benchmarks and 8.8% on causality problems compared to baselines.

Large language model (LLM) agents have shown promising performance in generating code for solving complex data science problems. Recent studies primarily focus on enhancing in-context learning through improved search, sampling, and planning techniques, while overlooking the importance of the order in which problems are tackled during inference. In this work, we develop a novel inference-time optimization framework, referred to as DSMentor, which leverages curriculum learning -- a strategy that introduces simpler task first and progressively moves to more complex ones as the learner improves -- to enhance LLM agent performance in challenging data science tasks. Our mentor-guided framework organizes data science tasks in order of increasing difficulty and incorporates a growing long-term memory to retain prior experiences, guiding the agent's learning progression and enabling more effective utilization of accumulated knowledge. We evaluate DSMentor through extensive experiments on DSEval and QRData benchmarks. Experiments show that DSMentor using Claude-3.5-Sonnet improves the pass rate by up to 5.2% on DSEval and QRData compared to baseline agents. Furthermore, DSMentor demonstrates stronger causal reasoning ability, improving the pass rate by 8.8% on the causality problems compared to GPT-4 using Program-of-Thoughts prompts. Our work underscores the importance of developing effective strategies for accumulating and utilizing knowledge during inference, mirroring the human learning process and opening new avenues for improving LLM performance through curriculum-based inference optimization.

View on arXiv PDF

Similar