Toward Auto-evaluation with Confidence-based Category Relation-aware Regression
This addresses the need for efficient model evaluation in machine learning, but it is incremental as it builds on existing auto-evaluation methods by incorporating category-wise information.
The paper tackles the problem of auto-evaluating trained models on test datasets without human annotations by proposing a method that uses confidence scores and category relations to predict overall and category-wise performance, achieving effectiveness in experiments.
Auto-evaluation aims to automatically evaluate a trained model on any test dataset without human annotations. Most existing methods utilize global statistics of features extracted by the model as the representation of a dataset. This ignores the influence of the classification head and loses category-wise confusion information of the model. However, ratios of instances assigned to different categories together with their confidence scores reflect how many instances in which categories are difficult for the model to classify, which contain significant indicators for both overall and category-wise performances. In this paper, we propose a Confidence-based Category Relation-aware Regression ($C^2R^2$) method. $C^2R^2$ divides all instances in a meta-set into different categories according to their confidence scores and extracts the global representation from them. For each category, $C^2R^2$ encodes its local confusion relations to other categories into a local representation. The overall and category-wise performances are regressed from global and local representations, respectively. Extensive experiments show the effectiveness of our method.