HCSep 1, 2020

Towards Evaluating Exploratory Model Building Process with AutoML Systems

Sungsoo Ray Hong, Sonia Castelo, Vito D'Orazio, Christopher Benthune, Aecio Santos, Scott Langevin, David Jonker, Enrico Bertini, Juliana Freire

arXiv:2009.00449v17.94 citations

Originality Incremental advance

AI Analysis

This addresses the problem of evaluating complex, exploratory AutoML systems for builders, but it is incremental as it builds on existing evaluation challenges without introducing a new paradigm.

The paper tackles the challenge of evaluating exploratory AutoML systems by proposing a methodology that divides systems into sub-components and visualizes user behavior and attitudes, finding that it helped professional builders gain novel insights and identify design improvements.

The use of Automated Machine Learning (AutoML) systems are highly open-ended and exploratory. While rigorously evaluating how end-users interact with AutoML is crucial, establishing a robust evaluation methodology for such exploratory systems is challenging. First, AutoML is complex, including multiple sub-components that support a variety of sub-tasks for synthesizing ML pipelines, such as data preparation, problem specification, and model generation, making it difficult to yield insights that tell us which components were successful or not. Second, because the usage pattern of AutoML is highly exploratory, it is not possible to rely solely on widely used task efficiency and effectiveness metrics as success metrics. To tackle the challenges in evaluation, we propose an evaluation methodology that (1) guides AutoML builders to divide their AutoML system into multiple sub-system components, and (2) helps them reason about each component through visualization of end-users' behavioral patterns and attitudinal data. We conducted a study to understand when, how, why, and applying our methodology can help builders to better understand their systems and end-users. We recruited 3 teams of professional AutoML builders. The teams prepared their own systems and let 41 end-users use the systems. Using our methodology, we visualized end-users' behavioral and attitudinal data and distributed the results to the teams. We analyzed the results in two directions: what types of novel insights the AutoML builders learned from end-users, and (2) how the evaluation methodology helped the builders to understand workflows and the effectiveness of their systems. Our findings suggest new insights explaining future design opportunities in the AutoML domain as well as how using our methodology helped the builders to determine insights and let them draw concrete directions for improving their systems.

View on arXiv PDF

Similar