PillarTrack:Boosting Pillar Representation for Transformer-based 3D Single Object Tracking on Point Clouds
This work addresses a critical issue in robotics and autonomous driving by improving tracking accuracy, though it appears incremental as it builds on existing transformer-based methods with pillar representations.
The paper tackles the problem of information loss in point-based 3D single object tracking by proposing PillarTrack, a pillar-based framework that achieves comparable performance on KITTI and NuScenes datasets, significantly boosting baseline results.
LiDAR-based 3D single object tracking (3D SOT) is a critical issue in robotics and autonomous driving. Existing 3D SOT methods typically adhere to a point-based processing pipeline, wherein the re-sampling operation invariably leads to either redundant or missing information, thereby impacting performance. To address these issues, we propose PillarTrack, a novel pillar-based 3D SOT framework. First, we transform sparse point clouds into dense pillars to preserve the local and global geometrics. Second, we propose a Pyramid-Encoded Pillar Feature Encoder (PE-PFE) design to enhance the robustness of pillar feature for translation/rotation/scale. Third, we present an efficient Transformer-based backbone from the perspective of modality differences. Finally, we construct our PillarTrack based on above designs. Extensive experiments show that our method achieves comparable performance on the KITTI and NuScenes datasets, significantly enhancing the performance of the baseline.