LGOct 28, 2025

Fill in the Blanks: Accelerating Q-Learning with a Handful of Demonstrations in Sparse Reward Settings

Seyed Mahdi Basiri Azad, Joschka Boedecker

arXiv:2510.24432v1h-index: 29

Originality Incremental advance

AI Analysis

This addresses the problem of slow learning in sparse-reward settings for RL practitioners, offering an incremental improvement over existing methods.

The paper tackles the challenge of reinforcement learning in sparse-reward environments by using a small number of demonstrations to initialize the value function, which accelerates convergence and improves sample efficiency, outperforming standard baselines in experiments.

Reinforcement learning (RL) in sparse-reward environments remains a significant challenge due to the lack of informative feedback. We propose a simple yet effective method that uses a small number of successful demonstrations to initialize the value function of an RL agent. By precomputing value estimates from offline demonstrations and using them as targets for early learning, our approach provides the agent with a useful prior over promising actions. The agent then refines these estimates through standard online interaction. This hybrid offline-to-online paradigm significantly reduces the exploration burden and improves sample efficiency in sparse-reward settings. Experiments on benchmark tasks demonstrate that our method accelerates convergence and outperforms standard baselines, even with minimal or suboptimal demonstration data.

View on arXiv PDF

Similar