CV AIMay 23, 2024

Enhanced Object Tracking by Self-Supervised Auxiliary Depth Estimation Learning

arXiv:2405.14195v12.01 citationsh-index: 24

Originality Incremental advance

AI Analysis

This addresses a practical limitation in RGB-D tracking for computer vision applications, but appears incremental as it builds on existing multi-task learning approaches.

The paper tackles the problem of object tracking's dependency on real depth inputs by proposing MDETrack, a method that adds self-supervised monocular depth estimation as an auxiliary task during training. Results show improved tracking accuracy without real depth, though specific numbers are not provided.

RGB-D tracking significantly improves the accuracy of object tracking. However, its dependency on real depth inputs and the complexity involved in multi-modal fusion limit its applicability across various scenarios. The utilization of depth information in RGB-D tracking inspired us to propose a new method, named MDETrack, which trains a tracking network with an additional capability to understand the depth of scenes, through supervised or self-supervised auxiliary Monocular Depth Estimation learning. The outputs of MDETrack's unified feature extractor are fed to the side-by-side tracking head and auxiliary depth estimation head, respectively. The auxiliary module will be discarded in inference, thus keeping the same inference speed. We evaluated our models with various training strategies on multiple datasets, and the results show an improved tracking accuracy even without real depth. Through these findings we highlight the potential of depth estimation in enhancing object tracking performance.

View on arXiv PDF

Similar