NELGFeb 20, 2017

Learning to Multi-Task by Active Sampling

arXiv:1702.06053v432 citations
Originality Highly original
AI Analysis

This work addresses the problem of building single agents that can solve multiple tasks efficiently for AI researchers and practitioners, offering a novel approach that reduces reliance on extensive data and computation compared to prior methods.

The paper tackles the challenge of multi-task learning for goal-directed sequential problems by proposing an efficient framework that uses active sampling to train on harder tasks more frequently, eliminating the need for expert supervision. It demonstrates results in the Atari 2600 domain across multiple instances, including up to 21 tasks, with an adaptive method achieving extremely competitive performance.

One of the long-standing challenges in Artificial Intelligence for learning goal-directed behavior is to build a single agent which can solve multiple tasks. Recent progress in multi-task learning for goal-directed sequential problems has been in the form of distillation based learning wherein a student network learns from multiple task-specific expert networks by mimicking the task-specific policies of the expert networks. While such approaches offer a promising solution to the multi-task learning problem, they require supervision from large expert networks which require extensive data and computation time for training. In this work, we propose an efficient multi-task learning framework which solves multiple goal-directed tasks in an on-line setup without the need for expert supervision. Our work uses active learning principles to achieve multi-task learning by sampling the harder tasks more than the easier ones. We propose three distinct models under our active sampling framework. An adaptive method with extremely competitive multi-tasking performance. A UCB-based meta-learner which casts the problem of picking the next task to train on as a multi-armed bandit problem. A meta-learning method that casts the next-task picking problem as a full Reinforcement Learning problem and uses actor critic methods for optimizing the multi-tasking performance directly. We demonstrate results in the Atari 2600 domain on seven multi-tasking instances: three 6-task instances, one 8-task instance, two 12-task instances and one 21-task instance.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes