LGApr 4

Provable Multi-Task Reinforcement Learning: A Representation Learning Framework with Low Rank Rewards

arXiv:2604.0389132.0h-index: 8

Predicted impact top 72% in LG · last 90 daysOriginality Highly original

AI Analysis

For multi-task RL with linear MDPs, this work provides the first theoretical guarantees for low-rank matrix recovery under general feature distributions, addressing a key bottleneck in representation learning for RL.

This paper introduces a provable multi-task reinforcement learning framework that leverages low-rank reward structures to learn shared representations across tasks, achieving near-optimal policies with improved sample efficiency. The method relaxes restrictive assumptions like Gaussian features and demonstrates robust performance in experiments.

Multi-task representation learning (MTRL) is an approach that learns shared latent representations across related tasks, facilitating collaborative learning that improves the overall learning efficiency. This paper studies MTRL for multi-task reinforcement learning (RL), where multiple tasks have the same state-action space and transition probabilities, but different rewards. We consider T linear Markov Decision Processes (MDPs) where the reward functions and transition dynamics admit linear feature embeddings of dimension d. The relatedness among the tasks is captured by a low-rank structure on the reward matrices. Learning shared representations across multiple RL tasks is challenging due to the complex and policy-dependent nature of data that leads to a temporal progression of error. Our approach adopts a reward-free reinforcement learning framework to first learn a data-collection policy. This policy then informs an exploration strategy for estimating the unknown reward matrices. Importantly, the data collected under this well-designed policy enable accurate estimation, which ultimately supports the learning of an near-optimal policy. Unlike existing approaches that rely on restrictive assumptions such as Gaussian features, incoherence conditions, or access to optimal solutions, we propose a low-rank matrix estimation method that operates under more general feature distributions encountered in RL settings. Theoretical analysis establishes that accurate low-rank matrix recovery is achievable under these relaxed assumptions, and we characterize the relationship between representation error and sample complexity. Leveraging the learned representation, we construct near-optimal policies and prove a regret bound. Experimental results demonstrate that our method effectively learns robust shared representations and task dynamics from finite data.

View on arXiv PDF

Similar