How are policy gradient methods affected by the limits of control?
This addresses fundamental performance issues in reinforcement learning for control systems, but it appears incremental as it builds on known control theory concepts.
The paper investigates how control-theoretic limitations affect stochastic policy gradient methods, finding that ill-conditioned linear systems lead to noisy gradient estimates and that stable systems can suffer from the curse of dimensionality.
We study stochastic policy gradient methods from the perspective of control-theoretic limitations. Our main result is that ill-conditioned linear systems in the sense of Doyle inevitably lead to noisy gradient estimates. We also give an example of a class of stable systems in which policy gradient methods suffer from the curse of dimensionality. Our results apply to both state feedback and partially observed systems.