On Distributional Reinforcement Learning in Chaotic Dynamical Systems
This work provides a theoretical foundation for using distributional RL in chaotic systems, which are prevalent in scientific and engineering domains, but the results are primarily analytical without empirical validation or concrete performance numbers.
The paper shows that distributional reinforcement learning (RL) yields better-conditioned optimization in chaotic dynamical systems because the return distribution evolves more smoothly than individual trajectories under the 1-Wasserstein metric, providing a principled explanation for distributional RL's advantages in such settings.
Chaotic dynamical systems pose a fundamental challenge for Reinforcement Learning (RL): exponential sensitivity to initial conditions induces high-variance bootstrap targets and poorly conditioned gradient updates. Chaotic dynamics arise across scientific and engineering domains, from fluid flows and climate systems to multi-agent systems, where reliable learning is highly desirable. Standard RL methods optimise expected returns through scalar value functions, implicitly averaging over diverging trajectories and entangling trajectory level instability with the learning objective. We show that under mild statistical stability assumptions, the return distribution evolves more regularly than individual trajectories when measured under the $1$-Wasserstein metric, yielding a smoother distributional Bellman objective. By aligning optimisation with this measure level structure, distributional RL provides better conditioned learning. We offer a principled explanation for the advantages of distributional methods in chaotic systems and the geometries of RL objectives under chaos.