Exploration Potential
This addresses the problem of efficient exploration in reinforcement learning for agents, though it appears incremental as it builds on existing concepts like information gain.
The paper introduces exploration potential, a measure of how much a reinforcement learning agent has explored its environment class, which accounts for reward structure and is necessary and sufficient for asymptotic optimality. Experiments in multi-armed bandits demonstrate its use in analyzing exploration-exploitation tradeoffs.
We introduce exploration potential, a quantity that measures how much a reinforcement learning agent has explored its environment class. In contrast to information gain, exploration potential takes the problem's reward structure into account. This leads to an exploration criterion that is both necessary and sufficient for asymptotic optimality (learning to act optimally across the entire environment class). Our experiments in multi-armed bandits use exploration potential to illustrate how different algorithms make the tradeoff between exploration and exploitation.