AIJun 20, 2025

Style-Preserving Policy Optimization for Game Agents

arXiv:2506.16995v3h-index: 4
Originality Incremental advance
AI Analysis

This work addresses the need for engaging gameplay experiences by enhancing the replay value of games through proficient and diverse AI agents, representing an incremental improvement over existing methods.

The paper tackled the problem of generating game agents that are both proficient and stylistically diverse by proposing Mixed Proximal Policy Optimization (MPPO), which improved suboptimal agents' proficiency to match or exceed online algorithms while preserving their distinct play styles.

Proficient game agents with diverse play styles enrich the gaming experience and enhance the replay value of games. However, recent advancements in game AI based on reinforcement learning have predominantly focused on improving proficiency, whereas methods based on evolution algorithms generate agents with diverse play styles but exhibit subpar performance compared to RL methods. To address this gap, this paper proposes Mixed Proximal Policy Optimization (MPPO), a method designed to improve the proficiency of existing suboptimal agents while retaining their distinct styles. MPPO unifies loss objectives for both online and offline samples and introduces an implicit constraint to approximate demonstrator policies by adjusting the empirical distribution of samples. Empirical results across environments of varying scales demonstrate that MPPO achieves proficiency levels comparable to, or even superior to, pure online algorithms while preserving demonstrators' play styles. This work presents an effective approach for generating highly proficient and diverse game agents, ultimately contributing to more engaging gameplay experiences.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes