An Output Feedback Q-learning Algorithm for Optimal Control of Nonlinear Systems with Koopman Linear Embedding
This work addresses the problem of optimal control for nonlinear systems in reinforcement learning, offering a method that preserves theoretical guarantees, though it is incremental as it extends an existing algorithm to a specific class of systems.
The paper tackles the challenge of applying reinforcement learning with strong theoretical guarantees to nonlinear systems by using an output-feedback Q-learning algorithm for systems with Koopman linear embeddings, achieving the same guarantees as for linear time-invariant systems without requiring system models or function approximation.
In the reinforcement learning literature, strong theoretical guarantees have been obtained for algorithms applicable to LTI systems. However, in the nonlinear case only weaker results have been obtained for algorithms that mostly rely on the use of function approximation strategies like, for example, neural networks. In this paper, we study the applicability of a known output-feedback Q-learning algorithm to the class of nonlinear systems that admit a Koopman linear embedding. This algorithm uses only input-output data, and no knowledge of either the system model or the Koopman lifting functions is required. Moreover, no function approximation techniques are used, and the same theoretical guarantees as for LTI systems are preserved. Furthermore, we analyze the performance of the algorithm when the Koopman linear embedding is only an approximation of the real nonlinear system. A simulation example verifies the applicability of this method.