LGDec 25, 2025

Generative Actor Critic

Aoyang Qin, Deqian Kong, Wei Wang, Ying Nian Wu, Song-Chun Zhu, Sirui Xie

arXiv:2512.21527v19.42 citationsh-index: 5

Originality Highly original

AI Analysis

This addresses a key problem in reinforcement learning for researchers and practitioners by improving offline-to-online adaptation, though it is incremental as it builds on existing RL methods with a novel framework.

The paper tackles the challenge of refining offline pretrained models with online experiences in reinforcement learning by introducing the Generative Actor Critic (GAC) framework, which decouples policy evaluation and improvement through generative modeling and inference, resulting in strong offline performance and significantly enhanced offline-to-online improvement on benchmarks like Gym-MuJoCo and Maze2D.

Conventional Reinforcement Learning (RL) algorithms, typically focused on estimating or maximizing expected returns, face challenges when refining offline pretrained models with online experiences. This paper introduces Generative Actor Critic (GAC), a novel framework that decouples sequential decision-making by reframing \textit{policy evaluation} as learning a generative model of the joint distribution over trajectories and returns, $p(τ, y)$, and \textit{policy improvement} as performing versatile inference on this learned model. To operationalize GAC, we introduce a specific instantiation based on a latent variable model that features continuous latent plan vectors. We develop novel inference strategies for both \textit{exploitation}, by optimizing latent plans to maximize expected returns, and \textit{exploration}, by sampling latent plans conditioned on dynamically adjusted target returns. Experiments on Gym-MuJoCo and Maze2D benchmarks demonstrate GAC's strong offline performance and significantly enhanced offline-to-online improvement compared to state-of-the-art methods, even in absence of step-wise rewards.

View on arXiv PDF

Similar