LGMay 14

Distributionally Robust Multi-Task Reinforcement Learning via Adaptive Task Sampling

arXiv:2605.1435012.8
Predicted impact top 53% in LG · last 90 daysOriginality Incremental advance
AI Analysis

For multi-task RL practitioners, DRATS offers a simple task sampling method that mitigates performance imbalance without gradient manipulation or specialized architectures.

DRATS addresses imbalanced data allocation in multi-task RL by adaptively sampling tasks furthest from being solved, improving worst-task performance and data efficiency on MetaWorld benchmarks.

Multi-task reinforcement learning (MTRL) aims to train a single agent to efficiently optimize performance across multiple tasks simultaneously. However, jointly optimizing all tasks often yields imbalanced learning: agents quickly solve easy tasks but learn slowly on harder ones. While prior work primarily attributes this imbalance to conflicting task gradients and proposes gradient manipulation or specialized architectures to address it, we instead focus on a distinct and under-explored challenge: imbalanced data allocation. Standard MTRL allocates an equal number of environment interactions to each task, which over-allocates data to easy tasks that require relatively few interactions to solve and under-allocates data to hard tasks that require substantially more experience to solve. To address this challenge, we introduce Distributionally Robust Adaptive Task Sampling (DRATS), an algorithm that adaptively prioritizes sampling tasks furthest from being solved. We derive DRATS by formalizing MTRL as a feasibility problem from which we derive a minimax objective for minimizing the worst-case return gap, the difference between a desired target return and the agent's return on a task. In benchmarks like MetaWorld-MT10 and MT50, DRATS improves data efficiency and increases worst-task performance compared to existing task sampling algorithms.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes