LGNov 7, 2025

Distributionally Robust Self Paced Curriculum Reinforcement Learning

Anirudh Satheesh, Keenan Powell, Vaneet Aggarwal

arXiv:2511.05694v2h-index: 8

Originality Incremental advance

AI Analysis

This addresses the problem of unstable and overly conservative policies in distributionally robust reinforcement learning for AI agents, representing an incremental improvement over fixed scheduling methods.

The paper tackles the trade-off between performance and robustness in reinforcement learning under distribution shifts by proposing DR-SPCRL, which adaptively schedules the robustness budget as a curriculum, resulting in an average 11.8% increase in episodic return under perturbations and about 1.9x the performance of nominal RL algorithms.

A central challenge in reinforcement learning is that policies trained in controlled environments often fail under distribution shifts at deployment into real-world environments. Distributionally Robust Reinforcement Learning (DRRL) addresses this by optimizing for worst-case performance within an uncertainty set defined by a robustness budget $ε$. However, fixing $ε$ results in a tradeoff between performance and robustness: small values yield high nominal performance but weak robustness, while large values can result in instability and overly conservative policies. We propose Distributionally Robust Self-Paced Curriculum Reinforcement Learning (DR-SPCRL), a method that overcomes this limitation by treating $ε$ as a continuous curriculum. DR-SPCRL adaptively schedules the robustness budget according to the agent's progress, enabling a balance between nominal and robust performance. Empirical results across multiple environments demonstrate that DR-SPCRL not only stabilizes training but also achieves a superior robustness-performance trade-off, yielding an average 11.8\% increase in episodic return under varying perturbations compared to fixed or heuristic scheduling strategies, and achieving approximately 1.9$\times$ the performance of the corresponding nominal RL algorithms.

View on arXiv PDF

Similar