CVMay 16, 2023

Understanding 3D Object Interaction from a Single Image

arXiv:2305.09664v237 citations
Originality Incremental advance
AI Analysis

This addresses a challenge for intelligent agents in robotics and scene exploration, but appears incremental as it builds on existing transformer methods.

The paper tackles the problem of enabling machines to understand 3D object interactions from a single image, using a transformer-based model to predict object properties and affordances, and reports strong performance and generalization to robotics data.

Humans can easily understand a single image as depicting multiple potential objects permitting interaction. We use this skill to plan our interactions with the world and accelerate understanding new objects without engaging in interaction. In this paper, we would like to endow machines with the similar ability, so that intelligent agents can better explore the 3D scene or manipulate objects. Our approach is a transformer-based model that predicts the 3D location, physical properties and affordance of objects. To power this model, we collect a dataset with Internet videos, egocentric videos and indoor images to train and validate our approach. Our model yields strong performance on our data, and generalizes well to robotics data. Project site: https://jasonqsy.github.io/3DOI/

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes