CVAIMay 23, 2024

Enhanced Object Tracking by Self-Supervised Auxiliary Depth Estimation Learning

arXiv:2405.14195v11 citationsh-index: 1
Originality Incremental advance
AI Analysis

This addresses a practical limitation in RGB-D tracking for computer vision applications, but appears incremental as it builds on existing multi-task learning approaches.

The paper tackles the problem of object tracking's dependency on real depth inputs by proposing MDETrack, a method that adds self-supervised monocular depth estimation as an auxiliary task during training. Results show improved tracking accuracy without real depth, though specific numbers are not provided.

RGB-D tracking significantly improves the accuracy of object tracking. However, its dependency on real depth inputs and the complexity involved in multi-modal fusion limit its applicability across various scenarios. The utilization of depth information in RGB-D tracking inspired us to propose a new method, named MDETrack, which trains a tracking network with an additional capability to understand the depth of scenes, through supervised or self-supervised auxiliary Monocular Depth Estimation learning. The outputs of MDETrack's unified feature extractor are fed to the side-by-side tracking head and auxiliary depth estimation head, respectively. The auxiliary module will be discarded in inference, thus keeping the same inference speed. We evaluated our models with various training strategies on multiple datasets, and the results show an improved tracking accuracy even without real depth. Through these findings we highlight the potential of depth estimation in enhancing object tracking performance.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes