LGAIROJun 26, 2023

Beyond dynamic programming

arXiv:2306.15029v12.0h-index: 1
Originality Incremental advance
AI Analysis

This provides a novel theoretical framework for reinforcement learning, potentially addressing limitations of dynamic programming, but it appears incremental as it builds on existing concepts.

The paper tackles reinforcement learning by introducing Score-life programming, a theoretical approach that directly computes optimal infinite horizon action sequences without requiring a policy function, and demonstrates its effectiveness on nonlinear optimal control problems.

In this paper, we present Score-life programming, a novel theoretical approach for solving reinforcement learning problems. In contrast with classical dynamic programming-based methods, our method can search over non-stationary policy functions, and can directly compute optimal infinite horizon action sequences from a given state. The central idea in our method is the construction of a mapping between infinite horizon action sequences and real numbers in a bounded interval. This construction enables us to formulate an optimization problem for directly computing optimal infinite horizon action sequences, without requiring a policy function. We demonstrate the effectiveness of our approach by applying it to nonlinear optimal control problems. Overall, our contributions provide a novel theoretical framework for formulating and solving reinforcement learning problems.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes