Why Goal-Conditioned Reinforcement Learning Works: Relation to Dual Control
Provides theoretical justification for goal-conditioned RL, benefiting researchers in reinforcement learning and control theory.
The paper analyzes goal-conditioned RL from an optimal control perspective, deriving an optimality gap that explains its success over classical dense rewards, and connects it to dual control in partially observed settings. Validation on nonlinear environments shows advantages using RL and predictive control.
Goal-conditioned reinforcement learning (RL) concerns the problem of training an agent to maximize the probability of reaching target goal states. This paper presents an analysis of the goal-conditioned setting based on optimal control. In particular, we derive an optimality gap between more classical, often quadratic, objectives and the goal-conditioned reward, elucidating the success of goal-conditioned RL and why classical ``dense'' rewards can falter. We then consider the partially observed Markov decision setting and connect state estimation to our probabilistic reward, making the goal-conditioned reward well suited to dual control problems. The advantages of goal-conditioned policies are validated on nonlinear and uncertain environments using both RL and predictive control techniques.