LGApr 4

Provable Multi-Task Reinforcement Learning: A Representation Learning Framework with Low Rank Rewards

arXiv:2604.0389132.0h-index: 8
Predicted impact top 72% in LG · last 90 daysOriginality Highly original
AI Analysis

For multi-task RL with linear MDPs, this work provides the first theoretical guarantees for low-rank matrix recovery under general feature distributions, addressing a key bottleneck in representation learning for RL.

This paper introduces a provable multi-task reinforcement learning framework that leverages low-rank reward structures to learn shared representations across tasks, achieving near-optimal policies with improved sample efficiency. The method relaxes restrictive assumptions like Gaussian features and demonstrates robust performance in experiments.

Multi-task representation learning (MTRL) is an approach that learns shared latent representations across related tasks, facilitating collaborative learning that improves the overall learning efficiency. This paper studies MTRL for multi-task reinforcement learning (RL), where multiple tasks have the same state-action space and transition probabilities, but different rewards. We consider T linear Markov Decision Processes (MDPs) where the reward functions and transition dynamics admit linear feature embeddings of dimension d. The relatedness among the tasks is captured by a low-rank structure on the reward matrices. Learning shared representations across multiple RL tasks is challenging due to the complex and policy-dependent nature of data that leads to a temporal progression of error. Our approach adopts a reward-free reinforcement learning framework to first learn a data-collection policy. This policy then informs an exploration strategy for estimating the unknown reward matrices. Importantly, the data collected under this well-designed policy enable accurate estimation, which ultimately supports the learning of an near-optimal policy. Unlike existing approaches that rely on restrictive assumptions such as Gaussian features, incoherence conditions, or access to optimal solutions, we propose a low-rank matrix estimation method that operates under more general feature distributions encountered in RL settings. Theoretical analysis establishes that accurate low-rank matrix recovery is achievable under these relaxed assumptions, and we characterize the relationship between representation error and sample complexity. Leveraging the learned representation, we construct near-optimal policies and prove a regret bound. Experimental results demonstrate that our method effectively learns robust shared representations and task dynamics from finite data.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes