ROAIOct 12, 2025

UniCoD: Enhancing Robot Policy via Unified Continuous and Discrete Representation Learning

arXiv:2510.10642v23 citationsh-index: 10
Originality Incremental advance
AI Analysis

This work addresses the problem of enhancing robot policy learning for diverse tasks in open-ended environments, representing an incremental advancement by combining understanding and generation capabilities from pretrained models.

The paper tackled the challenge of building generalist robot policies by introducing UniCoD, which leverages unified continuous and discrete representation learning from large-scale pretraining on instructional videos and fine-tuning on robot data, resulting in performance improvements of 9% and 12% over baselines in simulation and real-world tasks.

Building generalist robot policies that can handle diverse tasks in open-ended environments is a central challenge in robotics. To leverage knowledge from large-scale pretraining, prior work (VLA) has typically built generalist policies either on top of vision-language understanding models (VLMs) or generative models. However, both semantic understanding from vision-language pretraining and visual dynamics modeling from visual-generation pretraining are crucial for embodied robots. Recent unified models of generation and understanding have demonstrated strong capabilities in both comprehension and generation through large-scale pretraining. We posit that robotic policy learning can likewise benefit from the combined strengths of understanding, planning, and continuous future representation learning. Building on this insight, we introduce UniCoD, which acquires the ability to dynamically model high-dimensional visual features through pretraining on over 1M internet-scale instructional manipulation videos. Subsequently, UniCoD is fine-tuned on data collected from the robot embodiment, enabling the learning of mappings from predictive representations to action tokens. Extensive experiments show our approach consistently outperforms baseline methods in terms of 9\% and 12\% across simulation environments and real-world out-of-distribution tasks.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes