Real-Time Segmentation Networks should be Latency Aware
This addresses the need for more accurate real-time performance evaluation in segmentation systems for applications like autonomous vehicles and robots, though it is incremental as it modifies existing metrics and methods.
The authors tackled the problem that mean Intersection over Union (mIoU) fails to capture real-time performance in segmentation networks, proposing a latency-aware metric and task that predicts future segmentation maps to match future input frames, with experiments showing improved rankings on this metric.
As scene segmentation systems reach visually accurate results, many recent papers focus on making these network architectures faster, smaller and more efficient. In particular, studies often aim at designingreal-time'systems. Achieving this goal is particularly relevant in the context of real-time video understanding for autonomous vehicles, and robots. In this paper, we argue that the commonly used performance metric of mean Intersection over Union (mIoU) does not fully capture the information required to estimate the true performance of these networks when they operate inreal-time'. We propose a change of objective in the segmentation task, and its associated metric that encapsulates this missing information in the following way: We propose to predict the future output segmentation map that will match the future input frame at the time when the network finishes the processing. We introduce the associated latency-aware metric, from which we can determine a ranking. We perform latency timing experiments of some recent networks on different hardware and assess the performances of these networks on our proposed task. We propose improvements to scene segmentation networks to better perform on our task by using multi-frames input and increasing capacity in the initial convolutional layers.