Second-Order MPC-Based Distributed Q-Learning
For multi-agent control systems using MPC-based Q-learning, this provides a faster and more stable learning method.
This work extends MPC-based distributed Q-learning to second-order gradient updates, enabling faster convergence and higher learning rates without instability. Simulations show it significantly outperforms first-order distributed Q-learning.
The state of the art for model predictive control (MPC)-based distributed Q-learning is limited to first-order gradient updates of the MPC parameterization. In general, using secondorder information can significantly improve the speed of convergence for learning, allowing the use of higher learning rates without introducing instability. This work presents a second-order extension to MPC-based Q-learning with updates distributed across local agents, relying only on locally available information and neighbor-to-neighbor communication. In simulation the approach is demonstrated to significantly outperform first-order distributed Q-learning.