CVJun 30, 2023

STTracker: Spatio-Temporal Tracker for 3D Single Object Tracking

arXiv:2306.17440v115 citationsh-index: 9
Originality Incremental advance
AI Analysis

This addresses tracking accuracy in 3D computer vision for applications like autonomous driving, but it is incremental as it builds on existing methods by incorporating historical information.

The paper tackles 3D single object tracking by using multi-frame point clouds to encode spatio-temporal information and a patch-based sparse attention mechanism, achieving competitive results of 62.6% on KITTI and 49.66% on NuScenes benchmarks.

3D single object tracking with point clouds is a critical task in 3D computer vision. Previous methods usually input the last two frames and use the predicted box to get the template point cloud in previous frame and the search area point cloud in the current frame respectively, then use similarity-based or motion-based methods to predict the current box. Although these methods achieved good tracking performance, they ignore the historical information of the target, which is important for tracking. In this paper, compared to inputting two frames of point clouds, we input multi-frame of point clouds to encode the spatio-temporal information of the target and learn the motion information of the target implicitly, which could build the correlations among different frames to track the target in the current frame efficiently. Meanwhile, rather than directly using the point feature for feature fusion, we first crop the point cloud features into many patches and then use sparse attention mechanism to encode the patch-level similarity and finally fuse the multi-frame features. Extensive experiments show that our method achieves competitive results on challenging large-scale benchmarks (62.6% in KITTI and 49.66% in NuScenes).

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes