AIAug 13, 2025

EvoCurr: Self-evolving Curriculum with Behavior Code Generation for Complex Decision-making

arXiv:2508.09586v23 citationsh-index: 3
Originality Highly original
AI Analysis

This addresses inefficiency in LLMs for high-complexity decision-making, offering a novel method with strong specific gains.

The paper tackles the problem of LLMs degrading in performance on complex decision-making tasks by proposing EvoCurr, a self-evolving curriculum framework that generates tailored problem sequences, resulting in significant improvements in task success rates and solution efficiency compared to direct-solving baselines.

Large Language Models (LLMs) have demonstrated remarkable capabilities across diverse domains, including programming, planning, and decision-making. However, their performance often degrades when faced with highly complex problem instances that require deep reasoning over long horizons. In such cases, direct problem-solving approaches can lead to inefficiency or failure due to the lack of structured intermediate guidance. To address this, we propose a novel self-evolve framework, EvoCurr, in which a dedicated curriculum-generation LLM constructs a sequence of problem instances with gradually increasing difficulty, tailored to the solver LLM's learning progress. The curriculum dynamically adapts easing challenges when the solver struggles and escalating them when success is consistent, thus maintaining an optimal learning trajectory. This approach enables the solver LLM, implemented as a code-generation model producing Python decision-tree scripts, to progressively acquire the skills needed for complex decision-making tasks. Experimental results on challenging decision-making benchmarks show that our method significantly improves task success rates and solution efficiency compared to direct-solving baselines. These findings suggest that LLM-driven curriculum learning holds strong potential for enhancing automated reasoning in real-world, high-complexity domains.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes