CV MMOct 11, 2024

Ego3DT: Tracking Every 3D Object in Ego-centric Videos

Shengyu Hao, Wenhao Chai, Zhonghan Zhao, Meiqi Sun, Wendi Hu, Jieyang Zhou, Yixian Zhao, Qi Li, Yizhou Wang, Xi Li, Gaoang Wang

arXiv:2410.08530v110.58 citationsh-index: 21MM

Originality Highly original

AI Analysis

This addresses the problem of object tracking in ego-centric videos for embodied intelligence applications, representing a novel method rather than an incremental improvement.

The paper tackles the challenge of accurately localizing and tracking objects in ego-centric videos by introducing Ego3DT, a zero-shot approach for 3D reconstruction and tracking of all objects, achieving 1.04x to 2.90x improvements in HOTA metrics on new datasets.

The growing interest in embodied intelligence has brought ego-centric perspectives to contemporary research. One significant challenge within this realm is the accurate localization and tracking of objects in ego-centric videos, primarily due to the substantial variability in viewing angles. Addressing this issue, this paper introduces a novel zero-shot approach for the 3D reconstruction and tracking of all objects from the ego-centric video. We present Ego3DT, a novel framework that initially identifies and extracts detection and segmentation information of objects within the ego environment. Utilizing information from adjacent video frames, Ego3DT dynamically constructs a 3D scene of the ego view using a pre-trained 3D scene reconstruction model. Additionally, we have innovated a dynamic hierarchical association mechanism for creating stable 3D tracking trajectories of objects in ego-centric videos. Moreover, the efficacy of our approach is corroborated by extensive experiments on two newly compiled datasets, with 1.04x - 2.90x in HOTA, showcasing the robustness and accuracy of our method in diverse ego-centric scenarios.

View on arXiv PDF

Similar