LGAIMLJan 6, 2024

SPQR: Controlling Q-ensemble Independence with Spiked Random Model for Reinforcement Learning

arXiv:2401.03137v19 citationsh-index: 5Has CodeNIPS
Originality Incremental advance
AI Analysis

This work addresses a critical challenge in reinforcement learning for improving performance on complex tasks or offline datasets, but it is incremental as it builds on existing ensemble methods with a novel theoretical approach.

The paper tackles overestimation bias in deep reinforcement learning by proposing SPQR, a regularization method based on random matrix theory to ensure independence in Q-ensembles, and it outperforms baseline algorithms in online and offline RL benchmarks.

Alleviating overestimation bias is a critical challenge for deep reinforcement learning to achieve successful performance on more complex tasks or offline datasets containing out-of-distribution data. In order to overcome overestimation bias, ensemble methods for Q-learning have been investigated to exploit the diversity of multiple Q-functions. Since network initialization has been the predominant approach to promote diversity in Q-functions, heuristically designed diversity injection methods have been studied in the literature. However, previous studies have not attempted to approach guaranteed independence over an ensemble from a theoretical perspective. By introducing a novel regularization loss for Q-ensemble independence based on random matrix theory, we propose spiked Wishart Q-ensemble independence regularization (SPQR) for reinforcement learning. Specifically, we modify the intractable hypothesis testing criterion for the Q-ensemble independence into a tractable KL divergence between the spectral distribution of the Q-ensemble and the target Wigner's semicircle distribution. We implement SPQR in several online and offline ensemble Q-learning algorithms. In the experiments, SPQR outperforms the baseline algorithms in both online and offline RL benchmarks.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes