AIJun 30, 2025

Self-correcting Reward Shaping via Language Models for Reinforcement Learning Agents in Games

arXiv:2506.23626v11 citationsh-index: 7
Originality Incremental advance
AI Analysis

This addresses the problem of reducing manual effort for RL experts in game development, though it is incremental as it builds on existing LM and RL methods.

The paper tackles the challenge of automating reward function tuning for reinforcement learning agents in games when game content changes, by using a language model to iteratively adjust weights based on behavioral goals, resulting in agents achieving up to 80% success rate and competitive lap times compared to expert tuning.

Reinforcement Learning (RL) in games has gained significant momentum in recent years, enabling the creation of different agent behaviors that can transform a player's gaming experience. However, deploying RL agents in production environments presents two key challenges: (1) designing an effective reward function typically requires an RL expert, and (2) when a game's content or mechanics are modified, previously tuned reward weights may no longer be optimal. Towards the latter challenge, we propose an automated approach for iteratively fine-tuning an RL agent's reward function weights, based on a user-defined language based behavioral goal. A Language Model (LM) proposes updated weights at each iteration based on this target behavior and a summary of performance statistics from prior training rounds. This closed-loop process allows the LM to self-correct and refine its output over time, producing increasingly aligned behavior without the need for manual reward engineering. We evaluate our approach in a racing task and show that it consistently improves agent performance across iterations. The LM-guided agents show a significant increase in performance from $9\%$ to $74\%$ success rate in just one iteration. We compare our LM-guided tuning against a human expert's manual weight design in the racing task: by the final iteration, the LM-tuned agent achieved an $80\%$ success rate, and completed laps in an average of $855$ time steps, a competitive performance against the expert-tuned agent's peak $94\%$ success, and $850$ time steps.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes