FocusTrack: One-Stage Focus-and-Suppress Framework for 3D Point Cloud Object Tracking
This work addresses limitations in 3D tracking for applications like autonomous driving by introducing a more efficient and accurate method, though it is incremental as it builds on existing motion-centric approaches.
The paper tackled the problem of error accumulation and computational bottlenecks in 3D point cloud object tracking by proposing FocusTrack, a one-stage framework that unifies motion-semantics co-modeling, achieving new state-of-the-art performance on benchmarks like KITTI, nuScenes, and Waymo with a speed of 105 FPS.
In 3D point cloud object tracking, the motion-centric methods have emerged as a promising avenue due to its superior performance in modeling inter-frame motion. However, existing two-stage motion-based approaches suffer from fundamental limitations: (1) error accumulation due to decoupled optimization caused by explicit foreground segmentation prior to motion estimation, and (2) computational bottlenecks from sequential processing. To address these challenges, we propose FocusTrack, a novel one-stage paradigms tracking framework that unifies motion-semantics co-modeling through two core innovations: Inter-frame Motion Modeling (IMM) and Focus-and-Suppress Attention. The IMM module employs a temp-oral-difference siamese encoder to capture global motion patterns between adjacent frames. The Focus-and-Suppress attention that enhance the foreground semantics via motion-salient feature gating and suppress the background noise based on the temporal-aware motion context from IMM without explicit segmentation. Based on above two designs, FocusTrack enables end-to-end training with compact one-stage pipeline. Extensive experiments on prominent 3D tracking benchmarks, such as KITTI, nuScenes, and Waymo, demonstrate that the FocusTrack achieves new SOTA performance while running at a high speed with 105 FPS.