LGMay 29

Survival Reinforcement Learning: Toward Scalable Self-Supervised RL

arXiv:2605.3127374.3
AI Analysis

This work provides a more scalable and effective self-supervised reinforcement learning approach for researchers and practitioners working on long-horizon robotic control problems, representing an incremental improvement over existing CRL methods.

This paper addresses the challenge of long-horizon goal-conditioned planning in self-supervised Contrastive Reinforcement Learning (CRL), which struggles with the uniformity-tolerance dilemma. The authors introduce Survival Reinforcement Learning (SRL), an online classification-based method that maximizes agent dwell time at target goals, outperforming scaled CRL by 2x to 8x on long-horizon locomotion tasks while matching its performance on manipulation tasks.

While self-supervised Contrastive Reinforcement Learning (CRL) has shown remarkable depth-scaling capabilities, successfully using networks over 64 layers, scaled CRL still struggles with long-horizon goal-conditioned planning due to the uniformity-tolerance dilemma inherent in contrastive losses. We introduce Survival Reinforcement Learning (SRL), an online classification-based alternative that extends the survival value learning framework by maximizing the agent's dwell time at target goals. SRL bypasses the structural constraints of CRL and mitigates the "bang-bang" control solutions inherent to survival frameworks, which often induce undesirable behavior in complex dynamical systems. Evaluated across diverse robotic benchmarks, scaled SRL matches state-of-the-art CRL on manipulation tasks and outperforms it by 2x to 8x on stable, long-horizon locomotion tasks. Our results provide strong additional evidence that classification-based methods may serve as a key primitive in the broader effort to scale reinforcement learning.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes