InterTrack: Interaction Transformer for 3D Multi-Object Tracking
This addresses the problem of accurate motion planning in dynamic environments for autonomous vehicles, representing a strong specific gain in performance.
The paper tackles the challenge of associating existing tracks to new detections in 3D multi-object tracking for autonomous vehicles, particularly in dense scenes, by proposing InterTrack with an Interaction Transformer, resulting in ranking 1st in overall AMOTA on the nuScenes benchmark using CenterPoint detections.
3D multi-object tracking (MOT) is a key problem for autonomous vehicles, required to perform well-informed motion planning in dynamic environments. Particularly for densely occupied scenes, associating existing tracks to new detections remains challenging as existing systems tend to omit critical contextual information. Our proposed solution, InterTrack, introduces the Interaction Transformer for 3D MOT to generate discriminative object representations for data association. We extract state and shape features for each track and detection, and efficiently aggregate global information via attention. We then perform a learned regression on each track/detection feature pair to estimate affinities, and use a robust two-stage data association and track management approach to produce the final tracks. We validate our approach on the nuScenes 3D MOT benchmark, where we observe significant improvements, particularly on classes with small physical sizes and clustered objects. As of submission, InterTrack ranks 1st in overall AMOTA among methods using CenterPoint detections.