Adaptive Data Exploitation in Deep Reinforcement Learning
This addresses data inefficiency in deep reinforcement learning, offering a practical solution for researchers and practitioners, though it appears incremental as it builds on existing RL methods with adaptive data management.
The paper tackles data efficiency and generalization in deep reinforcement learning by introducing ADEPT, a framework that adaptively manages data usage with multi-armed bandit algorithms, achieving superior performance and computational efficiency on benchmarks like Procgen, MiniGrid, and PyBullet.
We introduce ADEPT: Adaptive Data ExPloiTation, a simple yet powerful framework to enhance the **data efficiency** and **generalization** in deep reinforcement learning (RL). Specifically, ADEPT adaptively manages the use of sampled data across different learning stages via multi-armed bandit (MAB) algorithms, optimizing data utilization while mitigating overfitting. Moreover, ADEPT can significantly reduce the computational overhead and accelerate a wide range of RL algorithms. We test ADEPT on benchmarks including Procgen, MiniGrid, and PyBullet. Extensive simulation demonstrates that ADEPT can achieve superior performance with remarkable computational efficiency, offering a practical solution to data-efficient RL. Our code is available at https://github.com/yuanmingqi/ADEPT.