LGSep 5, 2025

Shift Before You Learn: Enabling Low-Rank Representations in Reinforcement Learning

arXiv:2509.05193v23 citationsh-index: 2
Originality Incremental advance
AI Analysis

This work addresses the problem of enabling low-rank representations in reinforcement learning, which is incremental as it refines existing assumptions and methods for more efficient learning.

The paper challenges the assumption that the successor measure in reinforcement learning is low-rank, showing that a low-rank structure emerges in a shifted version after bypassing initial transitions, and provides finite-sample guarantees for its estimation with errors governed by spectral recoverability, validated by experiments that demonstrate improved performance in goal-conditioned RL.

Low-rank structure is a common implicit assumption in many modern reinforcement learning (RL) algorithms. For instance, reward-free and goal-conditioned RL methods often presume that the successor measure admits a low-rank representation. In this work, we challenge this assumption by first remarking that the successor measure itself is not approximately low-rank. Instead, we demonstrate that a low-rank structure naturally emerges in the shifted successor measure, which captures the system dynamics after bypassing a few initial transitions. We provide finite-sample performance guarantees for the entry-wise estimation of a low-rank approximation of the shifted successor measure from sampled entries. Our analysis reveals that both the approximation and estimation errors are primarily governed by a newly introduced quantitity: the spectral recoverability of the corresponding matrix. To bound this parameter, we derive a new class of functional inequalities for Markov chains that we call Type II Poincaré inequalities and from which we can quantify the amount of shift needed for effective low-rank approximation and estimation. This analysis shows in particular that the required shift depends on decay of the high-order singular values of the shifted successor measure and is hence typically small in practice. Additionally, we establish a connection between the necessary shift and the local mixing properties of the underlying dynamical system, which provides a natural way of selecting the shift. Finally, we validate our theoretical findings with experiments, and demonstrate that shifting the successor measure indeed leads to improved performance in goal-conditioned RL.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes