MICo: Improved representations via sampling-based state similarity for Markov decision processes
This work addresses representation learning challenges for reinforcement learning practitioners, though it appears incremental as it builds on existing notions of state similarity.
The paper tackles the problem of learning effective state representations in deep reinforcement learning by introducing a new behavioral distance metric for Markov decision processes that addresses computational cost and scalability issues, achieving strong results on the Arcade Learning Environment benchmark.
We present a new behavioural distance over the state space of a Markov decision process, and demonstrate the use of this distance as an effective means of shaping the learnt representations of deep reinforcement learning agents. While existing notions of state similarity are typically difficult to learn at scale due to high computational cost and lack of sample-based algorithms, our newly-proposed distance addresses both of these issues. In addition to providing detailed theoretical analysis, we provide empirical evidence that learning this distance alongside the value function yields structured and informative representations, including strong results on the Arcade Learning Environment benchmark.