LGFeb 3, 2024

Adaptive $Q$-Aid for Conditional Supervised Learning in Offline Reinforcement Learning

arXiv:2402.02017v210 citationsh-index: 9NIPS
AI Analysis

This addresses the stitching problem in offline RL for improved decision-making, though it appears incremental as it builds on existing RCSL and Q-function methods.

The paper tackled the limitation of return-conditioned supervised learning (RCSL) in offline reinforcement learning by introducing Q-Aided Conditional Supervised Learning (QCS), which combines RCSL stability with Q-function stitching, resulting in significant performance improvements over RCSL and value-based methods across benchmarks.

Offline reinforcement learning (RL) has progressed with return-conditioned supervised learning (RCSL), but its lack of stitching ability remains a limitation. We introduce $Q$-Aided Conditional Supervised Learning (QCS), which effectively combines the stability of RCSL with the stitching capability of $Q$-functions. By analyzing $Q$-function over-generalization, which impairs stable stitching, QCS adaptively integrates $Q$-aid into RCSL's loss function based on trajectory return. Empirical results show that QCS significantly outperforms RCSL and value-based methods, consistently achieving or exceeding the maximum trajectory returns across diverse offline RL benchmarks.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes