LGFeb 3, 2024

Adaptive $Q$-Aid for Conditional Supervised Learning in Offline Reinforcement Learning

Jeonghye Kim, Suyoung Lee, Woojun Kim, Youngchul Sung

arXiv:2402.02017v211.511 citationsh-index: 9NIPS

Originality Incremental advance

AI Analysis

This addresses the stitching problem in offline RL for improved decision-making, though it appears incremental as it builds on existing RCSL and Q-function methods.

The paper tackled the limitation of return-conditioned supervised learning (RCSL) in offline reinforcement learning by introducing Q-Aided Conditional Supervised Learning (QCS), which combines RCSL stability with Q-function stitching, resulting in significant performance improvements over RCSL and value-based methods across benchmarks.

Offline reinforcement learning (RL) has progressed with return-conditioned supervised learning (RCSL), but its lack of stitching ability remains a limitation. We introduce $Q$-Aided Conditional Supervised Learning (QCS), which effectively combines the stability of RCSL with the stitching capability of $Q$-functions. By analyzing $Q$-function over-generalization, which impairs stable stitching, QCS adaptively integrates $Q$-aid into RCSL's loss function based on trajectory return. Empirical results show that QCS significantly outperforms RCSL and value-based methods, consistently achieving or exceeding the maximum trajectory returns across diverse offline RL benchmarks.

View on arXiv PDF

Similar