ML LGJan 2, 2020

Explainable outlier detection through decision tree conditioning

arXiv:2001.00636v19 citations

AI Analysis

This addresses the need for interpretable outlier detection in data analysis, though it appears incremental as it builds on existing GritBot software concepts.

The paper tackles the problem of making outlier detection explainable by developing OutlierTree, a method that uses supervised decision tree splits to identify outliers with human-readable explanations based on branch conditions and confidence intervals. The approach produces interpretable outlier explanations by contrasting flagged values against distribution statistics of non-outlier observations in the same branch.

This work describes an outlier detection procedure (named "OutlierTree") loosely based on the GritBot software developed by RuleQuest research, which works by evaluating and following supervised decision tree splits on variables, in whose branches 1-d confidence intervals are constructed for the target variable and potential outliers flagged according to these confidence intervals. Under this logic, it's possible to produce human-readable explanations for why a given value of a variable in an observation can be considered as outlier, by considering the decision tree branch conditions along with general distribution statistics among the non-outlier observations that fell into the same branch, which can then be contrasted against the value which lies outside the CI. The supervised splits help to ensure that the generated conditions are not spurious, but rather related to the target variable and having logical breakpoints.

View on arXiv PDF

Similar