AINov 20, 2025

Bridging VLMs and Embodied Intelligence with Deliberate Practice Policy Optimization

Yi Zhang, Che Liu, Xiancong Ren, Hanchu Ni, Yingji Zhang, Shuai Zhang, Zeyuan Ding, Jiayu Hu, Haozhe Shan, Junbo Qi, Yan Bai, Dengjie Li

arXiv:2511.16602v17.82 citationsh-index: 7Has Code

Originality Incremental advance

AI Analysis

This addresses the problem of resource-prohibitive training for embodied AI systems, offering a systematic framework to alleviate data and resource bottlenecks, though it appears incremental as it builds on existing methods like supervised fine-tuning and reinforcement learning.

The paper tackles the embodied data bottleneck and algorithmic inefficiency in developing embodied intelligence systems by introducing Deliberate Practice Policy Optimization (DPPO), a metacognitive training framework that alternates between supervised fine-tuning and reinforcement learning to maximize learning efficiency from sparse data. This approach yields a 20.3% performance improvement over the base model and surpasses open-source models at the 100B-parameter scale by 10.6%.

Developing a universal and versatile embodied intelligence system presents two primary challenges: the critical embodied data bottleneck, where real-world data is scarce and expensive, and the algorithmic inefficiency of existing methods, which are resource-prohibitive. To address these limitations, we introduce Deliberate Practice Policy Optimization (DPPO), a metacognitive ``Metaloop'' training framework that dynamically alternates between supervised fine-tuning (competence expansion) and reinforcement learning (skill refinement). This enables automatic weakness identification and targeted resource allocation, specifically designed to maximize learning efficiency from sparse, finite data. Theoretically, DPPO can be formalised as a unified preference-learning framework. Empirically, training a vision-language embodied model with DPPO, referred to as Pelican-VL 1.0, yields a 20.3% performance improvement over the base model and surpasses open-source models at the 100B-parameter scale by 10.6%. We are open-sourcing both the models and code, providing the first systematic framework that alleviates the data and resource bottleneck and enables the community to build versatile embodied agents efficiently.

View on arXiv PDF

Similar