Dynamical System Optimization
This work offers a novel optimization framework for dynamical systems that could simplify policy tuning across domains like control and AI, though it appears incremental as it builds on existing policy gradient methods.
The paper tackles the problem of optimizing policies in dynamical systems by proposing a framework that treats policy parameters as part of an autonomous system, eliminating the need for control-based methods like dynamic programming and reinforcement learning. It shows that this approach computes equivalent quantities to existing policy optimization techniques and extends to applications like behavioral cloning and generative AI tuning.
We develop an optimization framework centered around a core idea: once a (parametric) policy is specified, control authority is transferred to the policy, resulting in an autonomous dynamical system. Thus we should be able to optimize policy parameters without further reference to controls or actions, and without directly using the machinery of approximate Dynamic Programming and Reinforcement Learning. Here we derive simpler algorithms at the autonomous system level, and show that they compute the same quantities as policy gradients and Hessians, natural gradients, proximal methods. Analogs to approximate policy iteration and off-policy learning are also available. Since policy parameters and other system parameters are treated uniformly, the same algorithms apply to behavioral cloning, mechanism design, system identification, learning of state estimators. Tuning of generative AI models is not only possible, but is conceptually closer to the present framework than to Reinforcement Learning.