GRCVJun 10, 2025

iTACO: Interactable Digital Twins of Articulated Objects from Casually Captured RGBD Videos

arXiv:2506.08334v312 citationsh-index: 44
Originality Incremental advance
AI Analysis

This enables scalable acquisition of digital twins for embodied AI and robotics, though it is incremental as it builds on existing articulated object methods.

The paper tackles the problem of creating interactable digital twins of articulated objects from casually captured RGBD videos, which are challenging due to simultaneous object and camera motion and occlusions, and shows that iTACO outperforms existing methods on a new dataset 20× larger than prior work.

Articulated objects are prevalent in daily life. Interactable digital twins of such objects have numerous applications in embodied AI and robotics. Unfortunately, current methods to digitize articulated real-world objects require carefully captured data, preventing practical, scalable, and generalizable acquisition. We focus on motion analysis and part-level segmentation of an articulated object from a casually captured RGBD video shot with a hand-held camera. A casually captured video of an interaction with an articulated object is easy to obtain at scale using smartphones. However, this setting is challenging due to simultaneous object and camera motion and significant occlusions as the person interacts with the object. To tackle these challenges, we introduce iTACO: a coarse-to-fine framework that infers joint parameters and segments movable parts of the object from a dynamic RGBD video. To evaluate our method under this new setting, we build a dataset of 784 videos containing 284 objects across 11 categories that is 20$\times$ larger than available in prior work. We then compare our approach with existing methods that also take video as input. Our experiments show that iTACO outperforms existing articulated object digital twin methods on both synthetic and real casually captured RGBD videos.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes