LGSep 14, 2022

Meta Pattern Concern Score: A Novel Evaluation Measure with Human Values for Multi-classifiers

arXiv:2209.06408v3h-index: 11
Originality Incremental advance
AI Analysis

This addresses the need for better evaluation metrics in safety-critical applications by integrating human values, though it is incremental as it builds on existing metrics like confusion matrices and loss functions.

The paper tackles the problem of evaluating black-box classifiers by incorporating human values like penalizing severe errors differently, proposing the Meta Pattern Concern Score as a novel evaluation measure that can also refine training. Experiments show it reduces dangerous cases by 0.53% with only a 0.04% accuracy drop and improves model performance by lowering the score by 1.62% and reducing dangerous cases by 0.36%.

While advanced classifiers have been increasingly used in real-world safety-critical applications, how to properly evaluate the black-box models given specific human values remains a concern in the community. Such human values include punishing error cases of different severity in varying degrees and making compromises in general performance to reduce specific dangerous cases. In this paper, we propose a novel evaluation measure named Meta Pattern Concern Score based on the abstract representation of probabilistic prediction and the adjustable threshold for the concession in prediction confidence, to introduce the human values into multi-classifiers. Technically, we learn from the advantages and disadvantages of two kinds of common metrics, namely the confusion matrix-based evaluation measures and the loss values, so that our measure is effective as them even under general tasks, and the cross entropy loss becomes a special case of our measure in the limit. Besides, our measure can also be used to refine the model training by dynamically adjusting the learning rate. The experiments on four kinds of models and six datasets confirm the effectiveness and efficiency of our measure. And a case study shows it can not only find the ideal model reducing 0.53% of dangerous cases by only sacrificing 0.04% of training accuracy, but also refine the learning rate to train a new model averagely outperforming the original one with a 1.62% lower value of itself and 0.36% fewer number of dangerous cases.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes