AIFeb 3

Structuring Value Representations via Geometric Coherence in Markov Decision Processes

arXiv:2602.02978v17 citationsh-index: 6
Originality Incremental advance
AI Analysis

This work addresses sample efficiency and stability issues in reinforcement learning, offering a novel geometric approach that is incremental in its application to existing RL methods.

The paper tackles the problem of stabilizing and speeding up reinforcement learning by recasting value function estimates as learning a partially ordered set, proposing GCR-RL to ensure geometric coherence across sequences of posets. It demonstrates significant improvements in sample efficiency and stable performance over strong baselines in various tasks.

Geometric properties can be leveraged to stabilize and speed reinforcement learning. Existing examples include encoding symmetry structure, geometry-aware data augmentation, and enforcing structural restrictions. In this paper, we take a novel view of RL through the lens of order theory and recast value function estimates into learning a desired poset (partially ordered set). We propose \emph{GCR-RL} (Geometric Coherence Regularized Reinforcement Learning) that computes a sequence of super-poset refinements -- by refining posets in previous steps and learning additional order relationships from temporal difference signals -- thus ensuring geometric coherence across the sequence of posets underpinning the learned value functions. Two novel algorithms by Q-learning and by actor--critic are developed to efficiently realize these super-poset refinements. Their theoretical properties and convergence rates are analyzed. We empirically evaluate GCR-RL in a range of tasks and demonstrate significant improvements in sample efficiency and stable performance over strong baselines.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes