LGAIROFeb 1, 2023

QMP: Q-switch Mixture of Policies for Multi-Task Behavior Sharing

arXiv:2302.00671v36 citationsh-index: 26
Originality Incremental advance
AI Analysis

This work addresses sample efficiency for researchers and practitioners in multi-task reinforcement learning, offering an incremental improvement by adding behavioral policy sharing to existing methods.

The paper tackles the problem of improving sample efficiency in multi-task reinforcement learning by introducing a framework for sharing behavioral policies across tasks, which leads to higher-quality trajectories and complementary gains over existing methods in various environments.

Multi-task reinforcement learning (MTRL) aims to learn several tasks simultaneously for better sample efficiency than learning them separately. Traditional methods achieve this by sharing parameters or relabeled data between tasks. In this work, we introduce a new framework for sharing behavioral policies across tasks, which can be used in addition to existing MTRL methods. The key idea is to improve each task's off-policy data collection by employing behaviors from other task policies. Selectively sharing helpful behaviors acquired in one task to collect training data for another task can lead to higher-quality trajectories, leading to more sample-efficient MTRL. Thus, we introduce a simple and principled framework called Q-switch mixture of policies (QMP) that selectively shares behavior between different task policies by using the task's Q-function to evaluate and select useful shareable behaviors. We theoretically analyze how QMP improves the sample efficiency of the underlying RL algorithm. Our experiments show that QMP's behavioral policy sharing provides complementary gains over many popular MTRL algorithms and outperforms alternative ways to share behaviors in various manipulation, locomotion, and navigation environments. Videos are available at https://qmp-mtrl.github.io.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes