Boxer: Interactive Comparison of Classifier Results
This addresses the need for machine learning practitioners to efficiently compare and diagnose classifiers, though it is incremental as it builds on existing visualization and interaction techniques.
The authors tackled the problem of comparing classifier results by developing Boxer, a system that enables interactive exploration of subsets of training and testing instances, facilitating tasks like model selection and fairness assessment.
Machine learning practitioners often compare the results of different classifiers to help select, diagnose and tune models. We present Boxer, a system to enable such comparison. Our system facilitates interactive exploration of the experimental results obtained by applying multiple classifiers to a common set of model inputs. The approach focuses on allowing the user to identify interesting subsets of training and testing instances and comparing performance of the classifiers on these subsets. The system couples standard visual designs with set algebra interactions and comparative elements. This allows the user to compose and coordinate views to specify subsets and assess classifier performance on them. The flexibility of these compositions allow the user to address a wide range of scenarios in developing and assessing classifiers. We demonstrate Boxer in use cases including model selection, tuning, fairness assessment, and data quality diagnosis.