LG HCFeb 10, 2025

Boosting of Classification Models with Human-in-the-Loop Computational Visual Knowledge Discovery

arXiv:2502.07039v14.13 citationsh-index: 5HCI

Originality Incremental advance

AI Analysis

This addresses the need for accurate and interpretable models in high-risk domains like healthcare diagnosis, though it appears incremental as it builds on existing boosting and visualization methods.

The paper tackles the problem of improving accuracy and interpretability in high-risk classification tasks by shifting boosting methodology from focusing only on misclassified cases to all cases in class overlap areas using Computational and Interactive Visual Learning with human expertise. The result includes a perfectly accurate and interpretable model on the Iris dataset and simulated data showing generalized benefits to accuracy and interpretability.

High-risk artificial intelligence and machine learning classification tasks, such as healthcare diagnosis, require accurate and interpretable prediction models. However, classifier algorithms typically sacrifice individual case-accuracy for overall model accuracy, limiting analysis of class overlap areas regardless of task significance. The Adaptive Boosting meta-algorithm, which won the 2003 Gödel Prize, analytically assigns higher weights to misclassified cases to reclassify. However, it relies on weaker base classifiers that are iteratively strengthened, limiting improvements from base classifiers. Combining visual and computational approaches enables selecting stronger base classifiers before boosting. This paper proposes moving boosting methodology from focusing on only misclassified cases to all cases in the class overlap areas using Computational and Interactive Visual Learning (CIVL) with a Human-in-the-Loop. It builds classifiers in lossless visualizations integrating human domain expertise and visual insights. A Divide and Classify process splits cases to simple and complex, classifying these individually through computational analysis and data visualization with lossless visualization spaces of Parallel Coordinates or other General Line Coordinates. After finding pure and overlap class areas simple cases in pure areas are classified, generating interpretable sub-models like decision rules in Propositional and First-order Logics. Only multidimensional cases in the overlap areas are losslessly visualized simplifying end-user cognitive tasks to identify difficult case patterns, including engineering features to form new classifiable patterns. Demonstration shows a perfectly accurate and losslessly interpretable model of the Iris dataset, and simulated data shows generalized benefits to accuracy and interpretability of models, increasing end-user confidence in discovered models.

View on arXiv PDF

Similar