LGAIMay 27

SPAR: Support-Preserving Action Rectification

arXiv:2605.2787753.8
Predicted impact top 56% in LG · last 90 daysOriginality Highly original
AI Analysis

For offline reinforcement learning, SPAR solves the over-conservatism and gradient conflict problems, enabling stable and effective policy improvement from suboptimal data.

SPAR reframes offline policy improvement as local residual rectification anchored to a frozen behavior cloning policy, using Latent Self-Imitation to eliminate gradient conflicts. It achieves state-of-the-art performance on D4RL benchmarks, significantly improving over suboptimal baselines.

Offline policy improvement faces an inherent conflict between maximizing value and fitting the data distribution. While in-sample weighted regression is stable, it suffers from over-conservatism that suppresses high-value actions in the distribution tail; conversely, gradient-based approaches often exhibit a fitting-optimization conflict of gradients, which drives the policy off the data manifold. To address this, we propose Support-Preserving Action Rectification (SPAR), which reframes global learning as a local residual rectification anchored to a frozen pure behavior cloning policy. This framework performs fine-grained fitting and local policy improvement in the residual space, thereby contracting the search space. We further introduce Latent Self-Imitation, utilizing a latent-sampling weighted-regression mechanism to address fitting-improvement gradient conflict in the residual space. Theoretically, we prove this mechanism eliminates the manifold-normal drift of standard value gradients, while extensive D4RL experiments show SPAR extracts significant gains from suboptimal baselines to achieve state-of-the-art performance.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes