Is prioritized sweeping the better episodic control?
This work provides a theoretical comparison for researchers in reinforcement learning, but it is incremental as it builds on existing methods without introducing new paradigms.
The paper investigates the theoretical properties of episodic control in reinforcement learning, showing that in deterministic tree Markov decision processes, it is equivalent to prioritized sweeping in sample efficiency and resource demands, but prioritized sweeping performs better in general deterministic and stochastic environments.
Episodic control has been proposed as a third approach to reinforcement learning, besides model-free and model-based control, by analogy with the three types of human memory. i.e. episodic, procedural and semantic memory. But the theoretical properties of episodic control are not well investigated. Here I show that in deterministic tree Markov decision processes, episodic control is equivalent to a form of prioritized sweeping in terms of sample efficiency as well as memory and computation demands. For general deterministic and stochastic environments, prioritized sweeping performs better even when memory and computation demands are restricted to be equal to those of episodic control. These results suggest generalizations of prioritized sweeping to partially observable environments, its combined use with function approximation and the search for possible implementations of prioritized sweeping in brains.