AI LG DS OCMar 4, 2024

Koopman-Assisted Reinforcement Learning

Preston Rozwood, Edward Mehrez, Ludger Paehler, Wen Sun, Steven L. Brunton

arXiv:2403.02290v116.615 citationsh-index: 78

Originality Incremental advance

AI Analysis

This addresses computational challenges in reinforcement learning for control applications, offering a flexible framework for deterministic or stochastic systems, though it appears incremental as an enhancement to existing max-entropy methods.

The paper tackles the intractability of Bellman and Hamilton-Jacobi-Bellman equations in high-dimensional, nonlinear reinforcement learning systems by developing Koopman-assisted algorithms that lift dynamics into linear coordinates, achieving state-of-the-art performance on four controlled systems compared to traditional baselines.

The Bellman equation and its continuous form, the Hamilton-Jacobi-Bellman (HJB) equation, are ubiquitous in reinforcement learning (RL) and control theory. However, these equations quickly become intractable for systems with high-dimensional states and nonlinearity. This paper explores the connection between the data-driven Koopman operator and Markov Decision Processes (MDPs), resulting in the development of two new RL algorithms to address these limitations. We leverage Koopman operator techniques to lift a nonlinear system into new coordinates where the dynamics become approximately linear, and where HJB-based methods are more tractable. In particular, the Koopman operator is able to capture the expectation of the time evolution of the value function of a given system via linear dynamics in the lifted coordinates. By parameterizing the Koopman operator with the control actions, we construct a ``Koopman tensor'' that facilitates the estimation of the optimal value function. Then, a transformation of Bellman's framework in terms of the Koopman tensor enables us to reformulate two max-entropy RL algorithms: soft value iteration and soft actor-critic (SAC). This highly flexible framework can be used for deterministic or stochastic systems as well as for discrete or continuous-time dynamics. Finally, we show that these Koopman Assisted Reinforcement Learning (KARL) algorithms attain state-of-the-art (SOTA) performance with respect to traditional neural network-based SAC and linear quadratic regulator (LQR) baselines on four controlled dynamical systems: a linear state-space system, the Lorenz system, fluid flow past a cylinder, and a double-well potential with non-isotropic stochastic forcing.

View on arXiv PDF

Similar