LGROMay 18, 2023

Bayesian Reparameterization of Reward-Conditioned Reinforcement Learning with Energy-based Models

arXiv:2305.11340v13 citations
Originality Incremental advance
AI Analysis

This addresses critical challenges in offline reinforcement learning for improving generalization and safety in decision-making systems, representing a strong incremental advance.

The paper tackled the limitations of reward-conditioned reinforcement learning (RCRL) in generalizing to high reward-to-go inputs and avoiding out-of-distribution queries, proposing Bayesian Reparameterized RCRL (BR-RCRL) which improved performance by up to 11% on Gym-Mujoco and Atari offline RL benchmarks.

Recently, reward-conditioned reinforcement learning (RCRL) has gained popularity due to its simplicity, flexibility, and off-policy nature. However, we will show that current RCRL approaches are fundamentally limited and fail to address two critical challenges of RCRL -- improving generalization on high reward-to-go (RTG) inputs, and avoiding out-of-distribution (OOD) RTG queries during testing time. To address these challenges when training vanilla RCRL architectures, we propose Bayesian Reparameterized RCRL (BR-RCRL), a novel set of inductive biases for RCRL inspired by Bayes' theorem. BR-RCRL removes a core obstacle preventing vanilla RCRL from generalizing on high RTG inputs -- a tendency that the model treats different RTG inputs as independent values, which we term ``RTG Independence". BR-RCRL also allows us to design an accompanying adaptive inference method, which maximizes total returns while avoiding OOD queries that yield unpredictable behaviors in vanilla RCRL methods. We show that BR-RCRL achieves state-of-the-art performance on the Gym-Mujoco and Atari offline RL benchmarks, improving upon vanilla RCRL by up to 11%.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes