LG AINov 7, 2022

Reward-Predictive Clustering

Lucas Lehnert, Michael J. Frank, Michael L. Littman

arXiv:2211.03281v11.8h-index: 89

Originality Incremental advance

AI Analysis

This work addresses the challenge of building abstractions to speed up learning in reinforcement learning, though it is incremental as it extends existing tabular methods to deep learning.

The paper tackles the problem of accelerating reinforcement learning in new contexts by developing a clustering algorithm that enables reward-predictive state abstractions for deep learning settings, resulting in significantly faster learning in high-dimensional visual control tasks.

Recent advances in reinforcement-learning research have demonstrated impressive results in building algorithms that can out-perform humans in complex tasks. Nevertheless, creating reinforcement-learning systems that can build abstractions of their experience to accelerate learning in new contexts still remains an active area of research. Previous work showed that reward-predictive state abstractions fulfill this goal, but have only be applied to tabular settings. Here, we provide a clustering algorithm that enables the application of such state abstractions to deep learning settings, providing compressed representations of an agent's inputs that preserve the ability to predict sequences of reward. A convergence theorem and simulations show that the resulting reward-predictive deep network maximally compresses the agent's inputs, significantly speeding up learning in high dimensional visual control tasks. Furthermore, we present different generalization experiments and analyze under which conditions a pre-trained reward-predictive representation network can be re-used without re-training to accelerate learning -- a form of systematic out-of-distribution transfer.

View on arXiv PDF

Similar