From Prior to Pro: Efficient Skill Mastery via Distribution Contractive RL Finetuning
This work addresses the challenge of efficiently mastering complex robot skills for robotics applications, representing an incremental advancement by combining pretraining with RL finetuning.
The paper tackles the problem of refining pretrained generative robot policies into high-performing ones by introducing Distribution Contractive Reinforcement Learning (DICE-RL), which amplifies high-success behaviors from online feedback, resulting in reliable performance improvements with strong stability and sample efficiency in complex long-horizon manipulation tasks from high-dimensional pixel inputs.
We introduce Distribution Contractive Reinforcement Learning (DICE-RL), a framework that uses reinforcement learning (RL) as a "distribution contraction" operator to refine pretrained generative robot policies. DICE-RL turns a pretrained behavior prior into a high-performing "pro" policy by amplifying high-success behaviors from online feedback. We pretrain a diffusion- or flow-based policy for broad behavioral coverage, then finetune it with a stable, sample-efficient residual off-policy RL framework that combines selective behavior regularization with value-guided action selection. Extensive experiments and analyses show that DICE-RL reliably improves performance with strong stability and sample efficiency. It enables mastery of complex long-horizon manipulation skills directly from high-dimensional pixel inputs, both in simulation and on a real robot. Project website: https://zhanyisun.github.io/dice.rl.2026/.