OpenEgo: A Large-Scale Multimodal Egocentric Dataset for Dexterous Manipulation
This dataset lowers the barrier for learning dexterous manipulation from egocentric video, supporting reproducible research in vision-language-action learning, though it is incremental as it unifies existing datasets.
The paper tackles the lack of fine-grained, annotated egocentric videos for imitation learning by introducing OpenEgo, a large-scale multimodal dataset with 1107 hours covering 290 manipulation tasks, which enabled training policies to predict dexterous hand trajectories.
Egocentric human videos provide scalable demonstrations for imitation learning, but existing corpora often lack either fine-grained, temporally localized action descriptions or dexterous hand annotations. We introduce OpenEgo, a multimodal egocentric manipulation dataset with standardized hand-pose annotations and intention-aligned action primitives. OpenEgo totals 1107 hours across six public datasets, covering 290 manipulation tasks in 600+ environments. We unify hand-pose layouts and provide descriptive, timestamped action primitives. To validate its utility, we train language-conditioned imitation-learning policies to predict dexterous hand trajectories. OpenEgo is designed to lower the barrier to learning dexterous manipulation from egocentric video and to support reproducible research in vision-language-action learning. All resources and instructions will be released at www.openegocentric.com.