Visual object tracking performance measures revisited
This work addresses the problem of inconsistent evaluation in visual tracking research, pushing towards homogenization for better comparison and reliability, though it is incremental as it refines existing methodologies rather than introducing a new paradigm.
The paper tackles the lack of consensus in performance measures for visual object tracking, which complicates cross-paper comparisons and risks biased results. It analyzes existing measures theoretically and experimentally, narrowing them down to two complementary ones for accuracy and robustness, which have been adopted by VOT challenges.
The problem of visual tracking evaluation is sporting a large variety of performance measures, and largely suffers from lack of consensus about which measures should be used in experiments. This makes the cross-paper tracker comparison difficult. Furthermore, as some measures may be less effective than others, the tracking results may be skewed or biased towards particular tracking aspects. In this paper we revisit the popular performance measures and tracker performance visualizations and analyze them theoretically and experimentally. We show that several measures are equivalent from the point of information they provide for tracker comparison and, crucially, that some are more brittle than the others. Based on our analysis we narrow down the set of potential measures to only two complementary ones, describing accuracy and robustness, thus pushing towards homogenization of the tracker evaluation methodology. These two measures can be intuitively interpreted and visualized and have been employed by the recent Visual Object Tracking (VOT) challenges as the foundation for the evaluation methodology.