Detection of Active Emergency Vehicles using Per-Frame CNNs and Output Smoothing
This work addresses a specific perception challenge for self-driving vehicles, but it is incremental as it builds on existing CNN models with added temporal smoothing.
The paper tackled the problem of detecting active emergency vehicles (EVs) with flashing lights for self-driving vehicles, proposing a method using a per-frame CNN and output smoothing, and achieved improvements through data augmentation and hard sample training.
While inferring common actor states (such as position or velocity) is an important and well-explored task of the perception system aboard a self-driving vehicle (SDV), it may not always provide sufficient information to the SDV. This is especially true in the case of active emergency vehicles (EVs), where light-based signals also need to be captured to provide a full context. We consider this problem and propose a sequential methodology for the detection of active EVs, using an off-the-shelf CNN model operating at a frame level and a downstream smoother that accounts for the temporal aspect of flashing EV lights. We also explore model improvements through data augmentation and training with additional hard samples.