Real-time Full-stack Traffic Scene Perception for Autonomous Driving with Roadside Cameras
This work addresses infrastructure-assisted autonomous driving by providing a practical, efficient perception system for real-time traffic monitoring, though it is incremental in its modular approach.
The authors tackled real-time traffic scene perception for autonomous driving using roadside cameras by proposing a modular framework that decouples detection and localization, enabling training with only 2D annotations. The system was deployed at a roundabout in Ann Arbor, achieving an end-to-end delay of less than 20ms on low-power edge hardware.
We propose a novel and pragmatic framework for traffic scene perception with roadside cameras. The proposed framework covers a full-stack of roadside perception pipeline for infrastructure-assisted autonomous driving, including object detection, object localization, object tracking, and multi-camera information fusion. Unlike previous vision-based perception frameworks rely upon depth offset or 3D annotation at training, we adopt a modular decoupling design and introduce a landmark-based 3D localization method, where the detection and localization can be well decoupled so that the model can be easily trained based on only 2D annotations. The proposed framework applies to either optical or thermal cameras with pinhole or fish-eye lenses. Our framework is deployed at a two-lane roundabout located at Ellsworth Rd. and State St., Ann Arbor, MI, USA, providing 7x24 real-time traffic flow monitoring and high-precision vehicle trajectory extraction. The whole system runs efficiently on a low-power edge computing device with all-component end-to-end delay of less than 20ms.