What Does Rotation Prediction Tell Us about Classifier Accuracy under Varying Testing Environments?
This addresses the challenge of expensive image annotations in real-world testing for the machine learning community, though it is incremental as it builds on existing multi-task learning approaches.
The paper tackles the problem of evaluating classifier accuracy on unlabeled test sets under varying environments by discovering a strong linear correlation (Pearson's r > 0.88) between semantic classification accuracy and rotation prediction accuracy, enabling performance estimation via linear regression.
Understanding classifier decision under novel environments is central to the community, and a common practice is evaluating it on labeled test sets. However, in real-world testing, image annotations are difficult and expensive to obtain, especially when the test environment is changing. A natural question then arises: given a trained classifier, can we evaluate its accuracy on varying unlabeled test sets? In this work, we train semantic classification and rotation prediction in a multi-task way. On a series of datasets, we report an interesting finding, i.e., the semantic classification accuracy exhibits a strong linear relationship with the accuracy of the rotation prediction task (Pearson's Correlation r > 0.88). This finding allows us to utilize linear regression to estimate classifier performance from the accuracy of rotation prediction which can be obtained on the test set through the freely generated rotation labels.