ROAICVNov 19, 2025

In-N-On: Scaling Egocentric Manipulation with in-the-wild and on-task Data

arXiv:2511.15704v117 citationsh-index: 10
Originality Incremental advance
AI Analysis

This work addresses the problem of scaling manipulation learning for robotics by leveraging diverse human data, though it appears incremental as it builds on existing domain adaptation and flow matching techniques.

The paper tackles the challenge of using heterogeneous egocentric videos for manipulation policies by introducing a scalable data collection recipe and a dataset (PHSD) with over 1,000 hours of in-the-wild and 20 hours of on-task data, resulting in a language-conditioned flow matching policy (Human0) that achieves language following, few-shot learning, and improved robustness.

Egocentric videos are a valuable and scalable data source to learn manipulation policies. However, due to significant data heterogeneity, most existing approaches utilize human data for simple pre-training, which does not unlock its full potential. This paper first provides a scalable recipe for collecting and using egocentric data by categorizing human data into two categories: in-the-wild and on-task alongside with systematic analysis on how to use the data. We first curate a dataset, PHSD, which contains over 1,000 hours of diverse in-the-wild egocentric data and over 20 hours of on-task data directly aligned to the target manipulation tasks. This enables learning a large egocentric language-conditioned flow matching policy, Human0. With domain adaptation techniques, Human0 minimizes the gap between humans and humanoids. Empirically, we show Human0 achieves several novel properties from scaling human data, including language following of instructions from only human data, few-shot learning, and improved robustness using on-task data. Project website: https://xiongyicai.github.io/In-N-On/

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes