Sample Efficient Actor-Critic with Experience Replay
This addresses the need for more efficient and stable reinforcement learning methods for researchers and practitioners, though it appears incremental with specific innovations.
The paper tackled the problem of sample inefficiency in deep reinforcement learning by developing an actor-critic agent with experience replay, achieving stable and high performance on challenging environments like the Atari domain and continuous control tasks.
This paper presents an actor-critic deep reinforcement learning agent with experience replay that is stable, sample efficient, and performs remarkably well on challenging environments, including the discrete 57-game Atari domain and several continuous control problems. To achieve this, the paper introduces several innovations, including truncated importance sampling with bias correction, stochastic dueling network architectures, and a new trust region policy optimization method.