MLLGMay 14, 2018

A One-Class Classification Decision Tree Based on Kernel Density Estimation

arXiv:1805.05021v33 citations
Originality Incremental advance
AI Analysis

This addresses the problem of interpretability in one-class classification for domains like medical diagnosis, though it appears incremental as it combines existing techniques.

The authors tackled the need for interpretable one-class classification models by proposing a hybrid method called OC-Tree, which uses kernel density estimation in a decision tree framework and achieved favorable performance against state-of-the-art methods on benchmark datasets.

One-class Classification (OCC) is an area of machine learning which addresses prediction based on unbalanced datasets. Basically, OCC algorithms achieve training by means of a single class sample, with potentially some additional counter-examples. The current OCC models give satisfaction in terms of performance, but there is an increasing need for the development of interpretable models. In the present work, we propose a one-class model which addresses concerns of both performance and interpretability. Our hybrid OCC method relies on density estimation as part of a tree-based learning algorithm, called One-Class decision Tree (OC-Tree). Within a greedy and recursive approach, our proposal rests on kernel density estimation to split a data subset on the basis of one or several intervals of interest. Thus, the OC-Tree encloses data within hyper-rectangles of interest which can be described by a set of rules. Against state-of-the-art methods such as Cluster Support Vector Data Description (ClusterSVDD), One-Class Support Vector Machine (OCSVM) and isolation Forest (iForest), the OC-Tree performs favorably on a range of benchmark datasets. Furthermore, we propose a real medical application for which the OC-Tree has demonstrated its effectiveness, through the ability to tackle interpretable diagnosis aid based on unbalanced datasets.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes