ROAICVOct 22, 2022

H-SAUR: Hypothesize, Simulate, Act, Update, and Repeat for Understanding Object Articulations from Interactions

arXiv:2210.12521v17 citationsh-index: 137
Originality Highly original
AI Analysis

This addresses the challenge of robotic manipulation of complex articulated objects, which is incremental by building on existing frameworks with a probabilistic generative approach.

The paper tackles the problem of enabling autonomous agents to understand and manipulate articulated objects through strategic trial-and-error, proposing the H-SAUR framework that significantly outperforms the state-of-the-art on benchmarks like PartNet-Mobility and a novel PuzzleBoxes dataset, despite using zero training data.

The world is filled with articulated objects that are difficult to determine how to use from vision alone, e.g., a door might open inwards or outwards. Humans handle these objects with strategic trial-and-error: first pushing a door then pulling if that doesn't work. We enable these capabilities in autonomous agents by proposing "Hypothesize, Simulate, Act, Update, and Repeat" (H-SAUR), a probabilistic generative framework that simultaneously generates a distribution of hypotheses about how objects articulate given input observations, captures certainty over hypotheses over time, and infer plausible actions for exploration and goal-conditioned manipulation. We compare our model with existing work in manipulating objects after a handful of exploration actions, on the PartNet-Mobility dataset. We further propose a novel PuzzleBoxes benchmark that contains locked boxes that require multiple steps to solve. We show that the proposed model significantly outperforms the current state-of-the-art articulated object manipulation framework, despite using zero training data. We further improve the test-time efficiency of H-SAUR by integrating a learned prior from learning-based vision models.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes