LGAIAug 27, 2025

Learning Game-Playing Agents with Generative Code Optimization

arXiv:2508.19506v1
Originality Incremental advance
AI Analysis

This work addresses the challenge of building efficient and adaptable agents for complex reasoning tasks, though it is incremental in applying generative methods to programmatic policies.

The authors tackled the problem of learning game-playing agents by representing policies as Python programs and optimizing them with large language models, achieving performance competitive with deep reinforcement learning baselines on Atari games while using significantly less training time and fewer environment interactions.

We present a generative optimization approach for learning game-playing agents, where policies are represented as Python programs and refined using large language models (LLMs). Our method treats decision-making policies as self-evolving code, with current observation as input and an in-game action as output, enabling agents to self-improve through execution traces and natural language feedback with minimal human intervention. Applied to Atari games, our game-playing Python program achieves performance competitive with deep reinforcement learning (RL) baselines while using significantly less training time and much fewer environment interactions. This work highlights the promise of programmatic policy representations for building efficient, adaptable agents capable of complex, long-horizon reasoning.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes