LG AI ROOct 22, 2022

Solving Continuous Control via Q-learning

Tim Seyde, Peter Werner, Wilko Schwarting, Igor Gilitschenski, Martin Riedmiller, Daniela Rus, Markus Wulfmeier

DeepMind

arXiv:2210.12566v216.131 citationsh-index: 55Has Code

Originality Highly original

AI Analysis

This work addresses the complexity and computational overhead of actor-critic methods for researchers and practitioners in reinforcement learning, offering a simpler alternative that achieves competitive results.

The paper tackles the challenge of applying simpler critic-only Q-learning methods to continuous control tasks, which typically require complex actor-critic approaches, by introducing a modification that combines bang-bang action discretization with value decomposition to frame the problem as cooperative multi-agent reinforcement learning, resulting in performance matching state-of-the-art actor-critic methods on various tasks.

While there has been substantial success for solving continuous control with actor-critic methods, simpler critic-only methods such as Q-learning find limited application in the associated high-dimensional action spaces. However, most actor-critic methods come at the cost of added complexity: heuristics for stabilisation, compute requirements and wider hyperparameter search spaces. We show that a simple modification of deep Q-learning largely alleviates these issues. By combining bang-bang action discretization with value decomposition, framing single-agent control as cooperative multi-agent reinforcement learning (MARL), this simple critic-only approach matches performance of state-of-the-art continuous actor-critic methods when learning from features or pixels. We extend classical bandit examples from cooperative MARL to provide intuition for how decoupled critics leverage state information to coordinate joint optimization, and demonstrate surprisingly strong performance across a variety of continuous control tasks.

View on arXiv PDF Code

Similar