On the Importance of Exploration for Real Life Learned Algorithms
This work addresses the need for efficient exploration in reinforcement learning for real-time communication systems, but it is incremental as it applies known methods to a specific domain.
The paper tackled the problem of improving data quality for learning algorithms by comparing exploration strategies in Deep Q-Networks for puncturing transmissions in URLLC, showing that adaptive methods like variance-based and Maximum Entropy-based exploration are more efficient than epsilon-greedy.
The quality of data driven learning algorithms scales significantly with the quality of data available. One of the most straight-forward ways to generate good data is to sample or explore the data source intelligently. Smart sampling can reduce the cost of gaining samples, reduce computation cost in learning, and enable the learning algorithm to adapt to unforeseen events. In this paper, we teach three Deep Q-Networks (DQN) with different exploration strategies to solve a problem of puncturing ongoing transmissions for URLLC messages. We demonstrate the efficiency of two adaptive exploration candidates, variance-based and Maximum Entropy-based exploration, compared to the standard, simple epsilon-greedy exploration approach.