LG AINov 5, 2025

DQN Performance with Epsilon Greedy Policies and Prioritized Experience Replay

Daniel Perkins, Oscar J. Escobar, Luke Green

arXiv:2511.03670v1h-index: 16

Originality Synthesis-oriented

AI Analysis

This work addresses performance tuning for reinforcement learning practitioners, but it is incremental as it builds on established DQN methods.

The study tackled the problem of optimizing Deep Q-Networks in finite environments by analyzing epsilon-greedy exploration schedules and prioritized experience replay, finding that prioritized replay leads to faster convergence and higher returns compared to uniform or no replay strategies.

We present a detailed study of Deep Q-Networks in finite environments, emphasizing the impact of epsilon-greedy exploration schedules and prioritized experience replay. Through systematic experimentation, we evaluate how variations in epsilon decay schedules affect learning efficiency, convergence behavior, and reward optimization. We investigate how prioritized experience replay leads to faster convergence and higher returns and show empirical results comparing uniform, no replay, and prioritized strategies across multiple simulations. Our findings illuminate the trade-offs and interactions between exploration strategies and memory management in DQN training, offering practical recommendations for robust reinforcement learning in resource-constrained settings.

View on arXiv PDF

Similar