LGFeb 15, 2024

Revisiting Experience Replayable Conditions

arXiv:2402.10374v25 citationsh-index: 3Applied intelligence (Boston)

Originality Incremental advance

AI Analysis

This work addresses the problem of enabling experience replay in on-policy reinforcement learning algorithms, which could improve efficiency and stability for researchers and practitioners, though it appears incremental as it modifies existing methods.

The paper reconsiders the conditions for applying experience replay in reinforcement learning, proposing stabilization tricks to address instability factors, and demonstrates that these tricks enable experience replay in an on-policy algorithm with performance comparable to a state-of-the-art off-policy algorithm.

Experience replay (ER) used in (deep) reinforcement learning is considered to be applicable only to off-policy algorithms. However, there have been some cases in which ER has been applied for on-policy algorithms, suggesting that off-policyness might be a sufficient condition for applying ER. This paper reconsiders more strict "experience replayable conditions" (ERC) and proposes the way of modifying the existing algorithms to satisfy ERC. In light of this, it is postulated that the instability of policy improvements represents a pivotal factor in ERC. The instability factors are revealed from the viewpoint of metric learning as i) repulsive forces from negative samples and ii) replays of inappropriate experiences. Accordingly, the corresponding stabilization tricks are derived. As a result, it is confirmed through numerical simulations that the proposed stabilization tricks make ER applicable to an advantage actor-critic, an on-policy algorithm. Moreover, its learning performance is comparable to that of a soft actor-critic, a state-of-the-art off-policy algorithm.

View on arXiv PDF

Similar