The Geometry of Nonlinear Reinforcement Learning
This work addresses the challenge of integrating diverse objectives in reinforcement learning for researchers and practitioners, though it appears incremental as it builds on existing geometric concepts.
The authors tackled the problem of unifying reward maximization, safe exploration, and intrinsic motivation in reinforcement learning by proposing a geometric framework that views these as instances of a single optimization problem on the space of achievable long-term behavior, resulting in a generalization of classical methods like policy mirror descent and natural policy gradient to nonlinear utilities and convex constraints.
Reward maximization, safe exploration, and intrinsic motivation are often studied as separate objectives in reinforcement learning (RL). We present a unified geometric framework, that views these goals as instances of a single optimization problem on the space of achievable long-term behavior in an environment. Within this framework, classical methods such as policy mirror descent, natural policy gradient, and trust-region algorithms naturally generalize to nonlinear utilities and convex constraints. We illustrate how this perspective captures robustness, safety, exploration, and diversity objectives, and outline open challenges at the interface of geometry and deep RL.