LG ROMay 7

Adaptive Q-Chunking for Offline-to-Online Reinforcement Learning

Nandiraju Gireesh, Yuanliang Ju, He Wang

arXiv:2605.0554414.3

Predicted impact top 32% in LG · last 90 daysOriginality Incremental advance

AI Analysis

For RL practitioners, AQC provides a principled method to dynamically adjust action chunk sizes, improving both offline and online performance in manipulation tasks.

Adaptive Q-Chunking (AQC) addresses the suboptimality of fixed action chunk sizes in offline-to-online RL by adaptively selecting chunk sizes per state, achieving state-of-the-art success rates on OGBench and Robomimic and boosting performance on RoboCasa-GR1 tasks.

Offline-to-online reinforcement learning with action chunking eliminates multi-step off-policy bias and enables temporally coherent exploration, but all existing methods use a fixed chunk size across every state. This is suboptimal: near contact events the agent needs short chunks for reactive control, while during free-space motion long chunks provide better credit assignment. The natural solution is to train critics for several chunk sizes and select the best one at each state, but naive comparison of learned critic values systematically collapses to the shortest chunk due to discount-scale mismatch, and degrades to noise in low-value states. We propose Adaptive Q-Chunking (AQC), which resolves both failures by comparing the advantage of each chunk size relative to a per-horizon baseline, normalized by the discount factor. This criterion converts biased wrong answers into unbiased near-random choices when no genuine signal exists, and becomes discriminative when a particular scale enables better planning. We prove theoretical bounds on the advantage selector's noise immunity and on the value dominance of adaptive chunking over any fixed chunk size. We demonstrate that AQC achieves state-of-the-art offline and online success rates on OGBench and Robomimic, and can be applied to enhance the performance of large-scale VLA models that predict action sequences, significantly boosting performance on RoboCasa-GR1 tasks.

View on arXiv PDF

Similar