LG HCMay 28, 2023

Interactive Decision Tree Creation and Enhancement with Complete Visualization for Explainable Modeling

Boris Kovalerchuk Andrew Dunn, Alex Worland, Sridevi Wagle

arXiv:2305.18432v13.811 citations

Originality Incremental advance

AI Analysis

This work addresses the need for more explainable AI models, particularly for domain experts and end users, by providing enhanced visualization tools for decision trees, though it is incremental as it builds on existing visualization techniques.

The paper tackles the problem of improving interpretability and prediction accuracy of machine learning models by introducing two new visualization methods for decision trees using General Line Coordinates, which allow for detailed analysis of attributes, data flow, and model sensitivity, as demonstrated on benchmark datasets.

To increase the interpretability and prediction accuracy of the Machine Learning (ML) models, visualization of ML models is a key part of the ML process. Decision Trees (DTs) are essential in machine learning (ML) because they are used to understand many black box ML models including Deep Learning models. In this research, two new methods for creation and enhancement with complete visualizing Decision Trees as understandable models are suggested. These methods use two versions of General Line Coordinates (GLC): Bended Coordinates (BC) and Shifted Paired Coordinates (SPC). The Bended Coordinates are a set of line coordinates, where each coordinate is bended in a threshold point of the respective DT node. In SPC, each n-D point is visualized in a set of shifted pairs of 2-D Cartesian coordinates as a directed graph. These new methods expand and complement the capabilities of existing methods to visualize DT models more completely. These capabilities allow us to observe and analyze: (1) relations between attributes, (2) individual cases relative to the DT structure, (3) data flow in the DT, (4) sensitivity of each split threshold in the DT nodes, and (5) density of cases in parts of the n-D space. These features are critical for DT models' performance evaluation and improvement by domain experts and end users as they help to prevent overgeneralization and overfitting of the models. The advantages of this methodology are illustrated in the case studies on benchmark real-world datasets. The paper also demonstrates how to generalize them for decision tree visualizations in different General Line Coordinates.

View on arXiv PDF

Similar