LGMLOct 21, 2019

Who wants accurate models? Arguing for a different metrics to take classification models seriously

arXiv:1910.09246v24 citations
AI Analysis

This work addresses the need for better certification and validation of AI decision support in healthcare, though it is incremental as it builds on existing metrics like balanced accuracy.

The paper tackles the problem of inadequate performance metrics for AI systems in clinical practice by proposing H-accuracy, a new measure that incorporates clinician preferences and error impacts, and demonstrates its descriptive power through user studies.

With the increasing availability of AI-based decision support, there is an increasing need for their certification by both AI manufacturers and notified bodies, as well as the pragmatic (real-world) validation of these systems. Therefore, there is the need for meaningful and informative ways to assess the performance of AI systems in clinical practice. Common metrics (like accuracy scores and areas under the ROC curve) have known problems and they do not take into account important information about the preferences of clinicians and the needs of their specialist practice, like the likelihood and impact of errors and the complexity of cases. In this paper, we present a new accuracy measure, the H-accuracy (Ha), which we claim is more informative in the medical domain (and others of similar needs) for the elements it encompasses. We also provide proof that the H-accuracy is a generalization of the balanced accuracy and establish a relation between the H-accuracy and the Net Benefit. Finally, we illustrate an experimentation in two user studies to show the descriptive power of the Ha score and how complementary and differently informative measures can be derived from its formulation (a Python script to compute Ha is also made available).

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes