ROMar 8

Residual Control for Fast Recovery from Dynamics Shifts

Nethmi Jayasinghe, Diana Gontero, Francesco Migliarba, Spencer T. Brown, Vinod K. Sangwan, Mark C. Hersam, Amit Ranjan Trivedi

arXiv:2603.07775v16.8h-index: 124

Predicted impact top 78% in RO · last 90 daysOriginality Highly original

AI Analysis

This work addresses the critical problem of fast recovery from unobserved dynamics shifts for real-world robotic systems, which is important for robust and reliable robot operation.

Robotic systems face performance degradation from unobserved dynamics shifts during execution. This paper proposes a residual control architecture that adapts to these shifts by adding a bounded residual control channel to a fixed, pre-trained policy, reducing recovery time by 87% on Go1, 48% on Cassie, 30% on H1, and 20% on Scout robots.

Robotic systems operating in real-world environments inevitably encounter unobserved dynamics shifts during continuous execution, including changes in actuation, mass distribution, or contact conditions. When such shifts occur mid-episode, even locally stabilizing learned policies can experience substantial transient performance degradation. While input-to-state stability guarantees bounded state deviation, it does not ensure rapid restoration of task-level performance. We address inference-time recovery under frozen policy parameters by casting adaptation as constrained disturbance shaping around a nominal stabilizing controller. We propose a stability-aligned residual control architecture in which a reinforcement learning policy trained under nominal dynamics remains fixed at deployment, and adaptation occurs exclusively through a bounded additive residual channel. A Stability Alignment Gate (SAG) regulates corrective authority through magnitude constraints, directional coherence with the nominal action, performance-conditioned activation, and adaptive gain modulation. These mechanisms preserve the nominal closed-loop structure while enabling rapid compensation for unobserved dynamics shifts without retraining or privileged disturbance information. Across mid-episode perturbations including actuator degradation, mass variation, and contact changes, the proposed method consistently reduces recovery time relative to frozen and online-adaptation baselines while maintaining near-nominal steady-state performance. Recovery time is reduced by \textbf{87\%} on the Go1 quadruped, \textbf{48\%} on the Cassie biped, \textbf{30\%} on the H1 humanoid, and \textbf{20\%} on the Scout wheeled platform on average across evaluated conditions relative to a frozen SAC policy.

View on arXiv PDF

Similar