How do Offline Measures for Exploration in Reinforcement Learning behave?
This work addresses the need for reliable offline exploration metrics in reinforcement learning, but it is incremental as it builds on existing metrics and focuses on implementation details.
The paper tackled the problem of evaluating exploration in reinforcement learning without algorithm dependence by comparing three existing offline metrics on simple distributions and identifying issues, and proposed a fourth metric, uniform relative entropy, with implementation choices significantly affecting the measures.
Sufficient exploration is paramount for the success of a reinforcement learning agent. Yet, exploration is rarely assessed in an algorithm-independent way. We compare the behavior of three data-based, offline exploration metrics described in the literature on intuitive simple distributions and highlight problems to be aware of when using them. We propose a fourth metric,uniform relative entropy, and implement it using either a k-nearest-neighbor or a nearest-neighbor-ratio estimator, highlighting that the implementation choices have a profound impact on these measures.