LGDec 22, 2021

Classifier Data Quality: A Geometric Complexity Based Method for Automated Baseline And Insights Generation

arXiv:2112.11832v2
Originality Incremental advance
AI Analysis

This work addresses the problem of evaluating ML model reliability for practitioners, though it is incremental as it builds on existing baseline practices.

The paper tackles the challenge of determining acceptable performance thresholds for ML models by developing geometric complexity measures that automatically set baselines and identify difficult-to-classify data points. Experiments on synthetic and real chatbot data show the measures effectively highlight regions prone to misclassification.

Testing Machine Learning (ML) models and AI-Infused Applications (AIIAs), or systems that contain ML models, is highly challenging. In addition to the challenges of testing classical software, it is acceptable and expected that statistical ML models sometimes output incorrect results. A major challenge is to determine when the level of incorrectness, e.g., model accuracy or F1 score for classifiers, is acceptable and when it is not. In addition to business requirements that should provide a threshold, it is a best practice to require any proposed ML solution to out-perform simple baseline models, such as a decision tree. We have developed complexity measures, which quantify how difficult given observations are to assign to their true class label; these measures can then be used to automatically determine a baseline performance threshold. These measures are superior to the best practice baseline in that, for a linear computation cost, they also quantify each observation' classification complexity in an explainable form, regardless of the classifier model used. Our experiments with both numeric synthetic data and real natural language chatbot data demonstrate that the complexity measures effectively highlight data regions and observations that are likely to be misclassified.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes