Benchmarking and Error Diagnosis in Multi-Instance Pose Estimation
This work provides a principled benchmark for researchers in computer vision to better understand and improve pose estimation algorithms, though it is incremental as it builds on existing methods.
The authors tackled the problem of diagnosing errors in multi-instance pose estimation by defining three error classes and analyzing their impact on algorithm performance, finding that error types depend heavily on keypoint count and clutter, with tools applied to compare leading methods on the COCO Dataset.
We propose a new method to analyze the impact of errors in algorithms for multi-instance pose estimation and a principled benchmark that can be used to compare them. We define and characterize three classes of errors - localization, scoring, and background - study how they are influenced by instance attributes and their impact on an algorithm's performance. Our technique is applied to compare the two leading methods for human pose estimation on the COCO Dataset, measure the sensitivity of pose estimation with respect to instance size, type and number of visible keypoints, clutter due to multiple instances, and the relative score of instances. The performance of algorithms, and the types of error they make, are highly dependent on all these variables, but mostly on the number of keypoints and the clutter. The analysis and software tools we propose offer a novel and insightful approach for understanding the behavior of pose estimation algorithms and an effective method for measuring their strengths and weaknesses.