SYLGOCDec 19, 2019

Distributed Reinforcement Learning for Decentralized Linear Quadratic Control: A Derivative-Free Policy Optimization Approach

arXiv:1912.09135v3127 citations
Originality Incremental advance
AI Analysis

This addresses decentralized control problems in large-scale systems like multi-zone HVAC, but it is incremental as it builds on existing policy gradient and consensus methods.

The paper tackles distributed reinforcement learning for decentralized linear quadratic control with partial observations and local costs by proposing the Zero-Order Distributed Policy Optimization algorithm (ZODPO), which learns linear local controllers with limited communication and storage, achieving polynomial sample complexity for scalability and stabilizing controllers with high probability.

This paper considers a distributed reinforcement learning problem for decentralized linear quadratic control with partial state observations and local costs. We propose a Zero-Order Distributed Policy Optimization algorithm (ZODPO) that learns linear local controllers in a distributed fashion, leveraging the ideas of policy gradient, zero-order optimization and consensus algorithms. In ZODPO, each agent estimates the global cost by consensus, and then conducts local policy gradient in parallel based on zero-order gradient estimation. ZODPO only requires limited communication and storage even in large-scale systems. Further, we investigate the nonasymptotic performance of ZODPO and show that the sample complexity to approach a stationary point is polynomial with the error tolerance's inverse and the problem dimensions, demonstrating the scalability of ZODPO. We also show that the controllers generated throughout ZODPO are stabilizing controllers with high probability. Lastly, we numerically test ZODPO on multi-zone HVAC systems.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes