Context-Aware Streaming Perception in Dynamic Environments
This work addresses streaming perception challenges for autonomous driving, offering an incremental improvement by optimizing accuracy per context rather than on average.
The paper tackles the problem of accuracy drop in real-time vision applications like autonomous driving due to ground truth changes during inference, proposing a method to maximize streaming accuracy for every environment context. Their approach improves tracking performance by 7.4% over static methods, with gains additive to offline accuracy advances.
Efficient vision works maximize accuracy under a latency budget. These works evaluate accuracy offline, one image at a time. However, real-time vision applications like autonomous driving operate in streaming settings, where ground truth changes between inference start and finish. This results in a significant accuracy drop. Therefore, a recent work proposed to maximize accuracy in streaming settings on average. In this paper, we propose to maximize streaming accuracy for every environment context. We posit that scenario difficulty influences the initial (offline) accuracy difference, while obstacle displacement in the scene affects the subsequent accuracy degradation. Our method, Octopus, uses these scenario properties to select configurations that maximize streaming accuracy at test time. Our method improves tracking performance (S-MOTA) by 7.4% over the conventional static approach. Further, performance improvement using our method comes in addition to, and not instead of, advances in offline accuracy.