HCApr 16, 2020

CrossCheck: Rapid, Reproducible, and Interpretable Model Evaluation

arXiv:2004.07993v1730 citations
AI Analysis

This tool addresses the need for more interpretable and reproducible model evaluation for data scientists, though it is incremental as it builds on existing visualization and workflow integration methods.

The paper tackles the problem of limited model evaluation beyond aggregate metrics by introducing CrossCheck, an interactive visualization tool for cross-model comparison and error analysis, demonstrating its utility in three use cases including named entity recognition, reading comprehension, and clickbait detection.

Evaluation beyond aggregate performance metrics, e.g. F1-score, is crucial to both establish an appropriate level of trust in machine learning models and identify future model improvements. In this paper we demonstrate CrossCheck, an interactive visualization tool for rapid crossmodel comparison and reproducible error analysis. We describe the tool and discuss design and implementation details. We then present three use cases (named entity recognition, reading comprehension, and clickbait detection) that show the benefits of using the tool for model evaluation. CrossCheck allows data scientists to make informed decisions to choose between multiple models, identify when the models are correct and for which examples, investigate whether the models are making the same mistakes as humans, evaluate models' generalizability and highlight models' limitations, strengths and weaknesses. Furthermore, CrossCheck is implemented as a Jupyter widget, which allows rapid and convenient integration into data scientists' model development workflows.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes