CVAILGAug 19, 2024

3D-Aware Instance Segmentation and Tracking in Egocentric Videos

arXiv:2408.09860v24 citationsh-index: 105
AI Analysis

It addresses challenges in 3D scene understanding for egocentric videos, which is important for applications like robotics and AR, but is incremental as it builds on existing segmentation and tracking methods with 3D enhancements.

This paper tackles the problem of instance segmentation and tracking in egocentric videos by introducing a 3D-aware approach that integrates scene geometry and temporal cues, achieving significant improvements such as a 7-point increase in Association Accuracy and a 73-80% reduction in ID switches compared to state-of-the-art 2D methods.

Egocentric videos present unique challenges for 3D scene understanding due to rapid camera motion, frequent object occlusions, and limited object visibility. This paper introduces a novel approach to instance segmentation and tracking in first-person video that leverages 3D awareness to overcome these obstacles. Our method integrates scene geometry, 3D object centroid tracking, and instance segmentation to create a robust framework for analyzing dynamic egocentric scenes. By incorporating spatial and temporal cues, we achieve superior performance compared to state-of-the-art 2D approaches. Extensive evaluations on the challenging EPIC Fields dataset demonstrate significant improvements across a range of tracking and segmentation consistency metrics. Specifically, our method outperforms the next best performing approach by $7$ points in Association Accuracy (AssA) and $4.5$ points in IDF1 score, while reducing the number of ID switches by $73\%$ to $80\%$ across various object categories. Leveraging our tracked instance segmentations, we showcase downstream applications in 3D object reconstruction and amodal video object segmentation in these egocentric settings.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes