CVMar 26

PAWS: Perception of Articulation in the Wild at Scale from Egocentric Videos

arXiv:2603.2553982.6h-index: 45
AI Analysis

This addresses articulation perception for robotics, simulation, and animation by scaling beyond supervised methods with manual annotations.

The paper tackles the problem of recovering motion and structure of articulated objects from egocentric videos, proposing PAWS which extracts object articulations from hand-object interactions and achieves significant improvements over baselines on HD-EPIC and Arti4D datasets.

Articulation perception aims to recover the motion and structure of articulated objects (e.g., drawers and cupboards), and is fundamental to 3D scene understanding in robotics, simulation, and animation. Existing learning-based methods rely heavily on supervised training with high-quality 3D data and manual annotations, limiting scalability and diversity. To address this limitation, we propose PAWS, a method that directly extracts object articulations from hand-object interactions in large-scale in-the-wild egocentric videos. We evaluate our method on the public data sets, including HD-EPIC and Arti4D data sets, achieving significant improvements over baselines. We further demonstrate that the extracted articulations benefit downstream tasks, including fine-tuning 3D articulation prediction models and enabling robot manipulation. See the project website at https://aaltoml.github.io/PAWS/.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes