On The Stability of Video Detection and Tracking
This work addresses a gap in video analysis for researchers and practitioners by providing a new metric to evaluate stability, though it is incremental as it builds on existing accuracy metrics without introducing a new method.
The paper tackles the problem of stability in video detection and tracking, which had not been studied before, by proposing a novel evaluation metric that decomposes stability into fragment, center position, and scale/ratio errors, and shows it has low correlation with accuracy metrics like mAP.
In this paper, we study an important yet less explored aspect in video detection and tracking -- stability. Surprisingly, there is no prior work that tried to study it. As a result, we start our work by proposing a novel evaluation metric for video detection which considers both stability and accuracy. For accuracy, we extend the existing accuracy metric mean Average Precision (mAP). For stability, we decompose it into three terms: fragment error, center position error, scale and ratio error. Each error represents one aspect of stability. Furthermore, we demonstrate that the stability metric has low correlation with accuracy metric. Thus, it indeed captures a different perspective of quality. Lastly, based on this metric, we evaluate several existing methods for video detection and show how they affect accuracy and stability. We believe our work can provide guidance and solid baselines for future researches in the related areas.