LGFeb 21, 2025

Hyperspherical Normalization for Scalable Deep Reinforcement Learning

arXiv:2502.15280v230 citationsh-index: 44Has CodeICML
AI Analysis

This addresses a key bottleneck for researchers and practitioners in reinforcement learning by enabling more stable and scalable training, though it is incremental as it builds on existing methods like soft actor-critic.

The paper tackles the problem of unstable optimization and overfitting in deep reinforcement learning when scaling up model size and compute, introducing SimbaV2, which achieves state-of-the-art performance on 57 continuous control tasks across 4 domains.

Scaling up the model size and computation has brought consistent performance improvements in supervised learning. However, this lesson often fails to apply to reinforcement learning (RL) because training the model on non-stationary data easily leads to overfitting and unstable optimization. In response, we introduce SimbaV2, a novel RL architecture designed to stabilize optimization by (i) constraining the growth of weight and feature norm by hyperspherical normalization; and (ii) using a distributional value estimation with reward scaling to maintain stable gradients under varying reward magnitudes. Using the soft actor-critic as a base algorithm, SimbaV2 scales up effectively with larger models and greater compute, achieving state-of-the-art performance on 57 continuous control tasks across 4 domains. The code is available at https://dojeon-ai.github.io/SimbaV2.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes