Learning to Compose Skills
This work addresses the problem of efficiently composing basic behaviors into more complex ones for reinforcement learning agents, offering an incremental improvement in skill generalization.
This paper introduces a differentiable framework for learning to compose simple policies, called skills, into complex hierarchical behaviors. The framework was tested on collect and evade tasks, demonstrating its ability to quickly build complex skills and achieve zero-shot generalization to unseen skill combinations.
We present a differentiable framework capable of learning a wide variety of compositions of simple policies that we call skills. By recursively composing skills with themselves, we can create hierarchies that display complex behavior. Skill networks are trained to generate skill-state embeddings that are provided as inputs to a trainable composition function, which in turn outputs a policy for the overall task. Our experiments on an environment consisting of multiple collect and evade tasks show that this architecture is able to quickly build complex skills from simpler ones. Furthermore, the learned composition function displays some transfer to unseen combinations of skills, allowing for zero-shot generalizations.