CVROMar 10, 2025

A Data-Centric Revisit of Pre-Trained Vision Models for Robot Learning

arXiv:2503.06960v25 citationsh-index: 21Has CodeCVPR
Originality Incremental advance
AI Analysis

This addresses the problem of optimizing pre-trained vision models for robotics by enhancing object-centric learning, though it is incremental as it builds on existing methods.

The study found that pre-trained vision models like DINO and iBOT outperform MAE in robotics tasks but fail on non-object-centric data due to poor object-centric representation learning, and introduced SlotMIM, which improves transferability and achieves significant gains in image recognition, scene understanding, and robot learning evaluations.

Pre-trained vision models (PVMs) are fundamental to modern robotics, yet their optimal configuration remains unclear. Through systematic evaluation, we find that while DINO and iBOT outperform MAE across visuomotor control and perception tasks, they struggle when trained on non-(single-)object-centric (NOC) data--a limitation strongly correlated with their diminished ability to learn object-centric representations. This investigation indicates that the ability to form object-centric representations from the non-object-centric robotics dataset is the key to success for PVMs. Motivated by this discovery, we designed SlotMIM, a method that induces object-centric representations by introducing a semantic bottleneck to reduce the number of prototypes to encourage the emergence of objectness as well as cross-view consistency regularization for encouraging multiview invariance. Our experiments encompass pre-training on object-centric, scene-centric, web-crawled, and ego-centric data. Across all settings, our approach learns transferrable representations and achieves significant improvements over prior work in image recognition, scene understanding, and robot learning evaluations. When scaled up with million-scale datasets, our method also demonstrates superior data efficiency and scalability. Our code and models are publicly available at https://github.com/CVMI-Lab/SlotMIM.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes