Utilizing Evolution Strategies to Train Transformers in Reinforcement Learning
This work addresses the challenge of training complex transformer models in RL for researchers, but it is incremental as it applies an existing evolution strategy to new model types.
The authors tackled the problem of training transformer-based agents in reinforcement learning using evolution strategies, achieving strong results in MuJoCo Humanoid and Atari environments, with the method proving capable of producing high-performing agents.
We explore the capability of evolution strategies to train an agent with a policy based on a transformer architecture in a reinforcement learning setting. We performed experiments using OpenAI's highly parallelizable evolution strategy to train Decision Transformer in the MuJoCo Humanoid locomotion environment and in the environment of Atari games, testing the ability of this black-box optimization technique to train even such relatively large and complicated models (compared to those previously tested in the literature). The examined evolution strategy proved to be, in general, capable of achieving strong results and managed to produce high-performing agents, showcasing evolution's ability to tackle the training of even such complex models.