SDVTracker: Real-Time Multi-Sensor Association and Tracking for Self-Driving Vehicles
This addresses the critical need for robust, low-latency tracking in autonomous vehicles, particularly in crowded urban scenes, though it appears incremental as it builds on existing tracking frameworks with learned components.
The paper tackled the problem of accurate motion state estimation for Vulnerable Road Users (VRUs) in self-driving vehicles by introducing SDVTracker, a deep learned model for association and state estimation with an IMM filter, which significantly outperforms hand-engineered methods on a real-world dataset while running in less than 2.5 ms on CPU for 100 actors.
Accurate motion state estimation of Vulnerable Road Users (VRUs), is a critical requirement for autonomous vehicles that navigate in urban environments. Due to their computational efficiency, many traditional autonomy systems perform multi-object tracking using Kalman Filters which frequently rely on hand-engineered association. However, such methods fail to generalize to crowded scenes and multi-sensor modalities, often resulting in poor state estimates which cascade to inaccurate predictions. We present a practical and lightweight tracking system, SDVTracker, that uses a deep learned model for association and state estimation in conjunction with an Interacting Multiple Model (IMM) filter. The proposed tracking method is fast, robust and generalizes across multiple sensor modalities and different VRU classes. In this paper, we detail a model that jointly optimizes both association and state estimation with a novel loss, an algorithm for determining ground-truth supervision, and a training procedure. We show this system significantly outperforms hand-engineered methods on a real-world urban driving dataset while running in less than 2.5 ms on CPU for a scene with 100 actors, making it suitable for self-driving applications where low latency and high accuracy is critical.