ROAICVLGNEApr 17, 2023

Affordances from Human Videos as a Versatile Representation for Robotics

arXiv:2304.08488v1312 citationsh-index: 24
Originality Highly original
AI Analysis

This work addresses the challenge of bridging the gap between static dataset models and real-world robotic applications by using human videos to enhance robot interaction capabilities, representing a novel method for a known bottleneck.

The paper tackles the problem of enabling robots to learn from human videos by developing a visual affordance model that predicts where and how humans interact in scenes, and integrates it with four robot learning paradigms, achieving results across 4 environments, 10 tasks, and 2 robotic platforms.

Building a robot that can understand and learn to interact by watching humans has inspired several vision problems. However, despite some successful results on static datasets, it remains unclear how current models can be used on a robot directly. In this paper, we aim to bridge this gap by leveraging videos of human interactions in an environment centric manner. Utilizing internet videos of human behavior, we train a visual affordance model that estimates where and how in the scene a human is likely to interact. The structure of these behavioral affordances directly enables the robot to perform many complex tasks. We show how to seamlessly integrate our affordance model with four robot learning paradigms including offline imitation learning, exploration, goal-conditioned learning, and action parameterization for reinforcement learning. We show the efficacy of our approach, which we call VRB, across 4 real world environments, over 10 different tasks, and 2 robotic platforms operating in the wild. Results, visualizations and videos at https://robo-affordances.github.io/

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes