PathWise: Planning through World Model for Automated Heuristic Design via Self-Evolving LLMs
This addresses the issue of myopic and inefficient heuristic generation in combinatorial optimization for researchers and practitioners, representing a novel method rather than an incremental improvement.
The paper tackled the problem of automated heuristic design for combinatorial optimization by proposing PathWise, a multi-agent reasoning framework that formulates heuristic generation as a sequential decision process, resulting in faster convergence to better heuristics and improved generalization across different LLM backbones and problem sizes.
Large Language Models (LLMs) have enabled automated heuristic design (AHD) for combinatorial optimization problems (COPs), but existing frameworks' reliance on fixed evolutionary rules and static prompt templates often leads to myopic heuristic generation, redundant evaluations, and limited reasoning about how new heuristics should be derived. We propose a novel multi-agent reasoning framework, referred to as Planning through World Model for Automated Heuristic Design via Self-Evolving LLMs (PathWise), which formulates heuristic generation as a sequential decision process over an entailment graph serving as a compact, stateful memory of the search trajectory. This approach allows the system to carry forward past decisions and reuse or avoid derivation information across generations. A policy agent plans evolutionary actions, a world model agent generates heuristic rollouts conditioned on those actions, and critic agents provide routed reflections summarizing lessons from prior steps, shifting LLM-based AHD from trial-and-error evolution toward state-aware planning through reasoning. Experiments across diverse COPs show that PathWise converges faster to better heuristics, generalizes across different LLM backbones, and scales to larger problem sizes.