LG MLJul 4, 2022

Goal-Conditioned Generators of Deep Policies

Francesco Faccio, Vincent Herrmann, Aditya Ramesh, Louis Kirsch, Jürgen Schmidhuber

arXiv:2207.01570v111.812 citationsh-index: 100Has Code

Originality Incremental advance

AI Analysis

This work addresses the challenge of generating adaptable policies for various goals in reinforcement learning, representing an incremental advancement by scaling existing methods like HyperNetworks to deep neural networks.

The paper tackles the problem of learning goal-conditioned policies in reinforcement learning by introducing neural networks that generate deep policies as context-specific weight matrices, achieving competitive performance on continuous control tasks.

Goal-conditioned Reinforcement Learning (RL) aims at learning optimal policies, given goals encoded in special command inputs. Here we study goal-conditioned neural nets (NNs) that learn to generate deep NN policies in form of context-specific weight matrices, similar to Fast Weight Programmers and other methods from the 1990s. Using context commands of the form "generate a policy that achieves a desired expected return," our NN generators combine powerful exploration of parameter space with generalization across commands to iteratively find better and better policies. A form of weight-sharing HyperNetworks and policy embeddings scales our method to generate deep NNs. Experiments show how a single learned policy generator can produce policies that achieve any return seen during training. Finally, we evaluate our algorithm on a set of continuous control tasks where it exhibits competitive performance. Our code is public.

View on arXiv PDF Code

Similar