LGMar 5

Reward-Conditioned Reinforcement Learning

arXiv:2603.05066v1
Originality Incremental advance
AI Analysis

This work addresses the brittleness of RL agents to reward misspecification and their limited adaptability to changing task preferences, which is a problem for practitioners deploying RL in dynamic environments.

This paper introduces Reward-Conditioned Reinforcement Learning (RCRL), a framework that trains a single agent to optimize a family of reward specifications using experience collected under a single nominal objective. RCRL enables a single policy to represent reward-specific behaviors, improving performance under the nominal reward and allowing efficient adaptation to new parameterizations across various benchmarks.

RL agents are typically trained under a single, fixed reward function, which makes them brittle to reward misspecification and limits their ability to adapt to changing task preferences. We introduce Reward-Conditioned Reinforcement Learning (RCRL), a framework that trains a single agent to optimize a family of reward specifications while collecting experience under only one nominal objective. RCRL conditions the agent on reward parameterizations and learns multiple reward objectives from a shared replay data entirely off-policy, enabling a single policy to represent reward-specific behaviors. Across single-task, multi-task, and vision-based benchmarks, we show that RCRL not only improves performance under the nominal reward parameterization, but also enables efficient adaptation to new parameterizations. Our results demonstrate that RCRL provides a scalable mechanism for learning robust, steerable policies without sacrificing the simplicity of single-task training.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes