LGAISep 9, 2024

State-Novelty Guided Action Persistence in Deep Reinforcement Learning

arXiv:2409.05433v12 citationsh-index: 2
Originality Incremental advance
AI Analysis

This addresses sample inefficiency for DRL practitioners, offering an incremental improvement over existing action persistence techniques.

The paper tackles the sample inefficiency problem in deep reinforcement learning by proposing a state-novelty guided method to dynamically adjust action persistence, which significantly improves sample efficiency in DMControl tasks without requiring additional value functions or policy training.

While a powerful and promising approach, deep reinforcement learning (DRL) still suffers from sample inefficiency, which can be notably improved by resorting to more sophisticated techniques to address the exploration-exploitation dilemma. One such technique relies on action persistence (i.e., repeating an action over multiple steps). However, previous work exploiting action persistence either applies a fixed strategy or learns additional value functions (or policy) for selecting the repetition number. In this paper, we propose a novel method to dynamically adjust the action persistence based on the current exploration status of the state space. In such a way, our method does not require training of additional value functions or policy. Moreover, the use of a smooth scheduling of the repeat probability allows a more effective balance between exploration and exploitation. Furthermore, our method can be seamlessly integrated into various basic exploration strategies to incorporate temporal persistence. Finally, extensive experiments on different DMControl tasks demonstrate that our state-novelty guided action persistence method significantly improves the sample efficiency.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes