Game-Theoretic Co-Evolution for LLM-Based Heuristic Discovery
This addresses the issue of overfitting and poor generalization in heuristic discovery for combinatorial optimization, offering a novel approach to enhance robustness under distributional shifts.
The paper tackled the problem of static evaluation in automatic heuristic discovery (AHD) by proposing Algorithm Space Response Oracles (ASRO), a game-theoretic framework that reframes heuristic discovery as co-evolution between solver and instance generator, resulting in substantially improved generalization and robustness across multiple combinatorial optimization domains compared to static-training baselines.
Large language models (LLMs) have enabled rapid progress in automatic heuristic discovery (AHD), yet most existing methods are predominantly limited by static evaluation against fixed instance distributions, leading to potential overfitting and poor generalization under distributional shifts. We propose Algorithm Space Response Oracles (ASRO), a game-theoretic framework that reframes heuristic discovery as a program level co-evolution between solver and instance generator. ASRO models their interaction as a two-player zero-sum game, maintains growing strategy pools on both sides, and iteratively expands them via LLM-based best-response oracles against mixed opponent meta-strategies, thereby replacing static evaluation with an adaptive, self-generated curriculum. Across multiple combinatorial optimization domains, ASRO consistently outperforms static-training AHD baselines built on the same program search mechanisms, achieving substantially improved generalization and robustness on diverse and out-of-distribution instances.