Recurrent World Models Facilitate Policy Evolution
This work addresses the challenge of sample efficiency and generalization in reinforcement learning for AI researchers, presenting a novel approach that combines unsupervised world modeling with evolutionary policy optimization.
The paper tackled the problem of efficiently training agents in reinforcement learning environments by using a generative recurrent neural network to model environments and then evolving simple policies based on these models, achieving state-of-the-art results. It also demonstrated training agents entirely within their own generated world models and successfully transferring policies back to real environments.
A generative recurrent neural network is quickly trained in an unsupervised manner to model popular reinforcement learning environments through compressed spatio-temporal representations. The world model's extracted features are fed into compact and simple policies trained by evolution, achieving state of the art results in various environments. We also train our agent entirely inside of an environment generated by its own internal world model, and transfer this policy back into the actual environment. Interactive version of paper at https://worldmodels.github.io