Intrinsically Motivated Open-Ended Multi-Task Learning Using Transfer Learning to Discover Task Hierarchy
This work addresses the challenge of scalable and efficient multi-task learning for robots in industrial settings, though it appears incremental by building on existing hierarchical reinforcement learning and transfer learning techniques.
The paper tackles the problem of robots learning multiple hierarchical control tasks in open-ended environments by proposing a method that uses transfer learning to discover task hierarchies, enabling more efficient learning of complex tasks through knowledge transfer from simpler ones. The results show that task composition and decomposition are key, with the robot learning tasks of varying complexity using minimal demonstrations and efficiently transferring knowledge across tasks and learners.
In open-ended continuous environments, robots need to learn multiple parameterised control tasks in hierarchical reinforcement learning. We hypothesise that the most complex tasks can be learned more easily by transferring knowledge from simpler tasks, and faster by adapting the complexity of the actions to the task. We propose a task-oriented representation of complex actions, called procedures, to learn online task relationships and unbounded sequences of action primitives to control the different observables of the environment. Combining both goal-babbling with imitation learning, and active learning with transfer of knowledge based on intrinsic motivation, our algorithm self-organises its learning process. It chooses at any given time a task to focus on; and what, how, when and from whom to transfer knowledge. We show with a simulation and a real industrial robot arm, in cross-task and cross-learner transfer settings, that task composition is key to tackle highly complex tasks. Task decomposition is also efficiently transferred across different embodied learners and by active imitation, where the robot requests just a small amount of demonstrations and the adequate type of information. The robot learns and exploits task dependencies so as to learn tasks of every complexity.