HCAIApr 9, 2021

Model LineUpper: Supporting Interactive Model Comparison at Multiple Levels for AutoML

arXiv:2104.04375v126 citations
Originality Incremental advance
AI Analysis

This addresses the need for data scientists to evaluate AutoML models based on multiple criteria, such as errors and feature importance, rather than just performance metrics, though it is incremental in enhancing existing AutoML workflows.

The paper tackles the problem of human selection from multiple candidate models in AutoML by developing Model LineUpper, a tool that integrates Explainable AI and visualization techniques to support interactive model comparison beyond just performance metrics. The result includes insights from a user study on how users compare models and design implications for supporting data scientists in AutoML systems.

Automated Machine Learning (AutoML) is a rapidly growing set of technologies that automate the model development pipeline by searching model space and generating candidate models. A critical, final step of AutoML is human selection of a final model from dozens of candidates. In current AutoML systems, selection is supported only by performance metrics. Prior work has shown that in practice, people evaluate ML models based on additional criteria, such as the way a model makes predictions. Comparison may happen at multiple levels, from types of errors, to feature importance, to how the model makes predictions of specific instances. We developed \tool{} to support interactive model comparison for AutoML by integrating multiple Explainable AI (XAI) and visualization techniques. We conducted a user study in which we both evaluated the system and used it as a technology probe to understand how users perform model comparison in an AutoML system. We discuss design implications for utilizing XAI techniques for model comparison and supporting the unique needs of data scientists in comparing AutoML models.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes