VideoDex: Learning Dexterity from Internet Videos
This addresses the problem of data scarcity for robotic agents in real-world environments, offering a novel approach to learning dexterity from human videos, though it is incremental in building upon existing visual prior methods.
The paper tackles the challenge of enabling robots to learn dexterous manipulation without extensive real-world experience by leveraging internet videos of humans using their hands, resulting in strong performance on various manipulation tasks that outperforms state-of-the-art methods.
To build general robotic agents that can operate in many environments, it is often imperative for the robot to collect experience in the real world. However, this is often not feasible due to safety, time, and hardware restrictions. We thus propose leveraging the next best thing as real-world experience: internet videos of humans using their hands. Visual priors, such as visual features, are often learned from videos, but we believe that more information from videos can be utilized as a stronger prior. We build a learning algorithm, VideoDex, that leverages visual, action, and physical priors from human video datasets to guide robot behavior. These actions and physical priors in the neural network dictate the typical human behavior for a particular robot task. We test our approach on a robot arm and dexterous hand-based system and show strong results on various manipulation tasks, outperforming various state-of-the-art methods. Videos at https://video-dex.github.io