More Efficient Exploration with Symbolic Priors on Action Sequence Equivalences
This work addresses the challenge of inefficient exploration in reinforcement learning for researchers and practitioners, though it is incremental as it builds on existing methods like DQN.
The paper tackles the problem of incorporating prior knowledge about action sequence equivalences in reinforcement learning to improve exploration efficiency, demonstrating that their proposed strategy reduces collisions and increases new state visitations in various environments.
Incorporating prior knowledge in reinforcement learning algorithms is mainly an open question. Even when insights about the environment dynamics are available, reinforcement learning is traditionally used in a tabula rasa setting and must explore and learn everything from scratch. In this paper, we consider the problem of exploiting priors about action sequence equivalence: that is, when different sequences of actions produce the same effect. We propose a new local exploration strategy calibrated to minimize collisions and maximize new state visitations. We show that this strategy can be computed at little cost, by solving a convex optimization problem. By replacing the usual epsilon-greedy strategy in a DQN, we demonstrate its potential in several environments with various dynamic structures.