LGMay 18, 2015

Ensemble of Example-Dependent Cost-Sensitive Decision Trees

arXiv:1505.04637v124 citations
Originality Incremental advance
AI Analysis

This work addresses cost-sensitive classification for applications like fraud detection and credit scoring, but it is incremental as it builds on existing cost-sensitive decision trees with ensemble improvements.

The authors tackled the problem of example-dependent cost-sensitive classification, where misclassification costs vary per example, by proposing an ensemble framework of cost-sensitive decision trees and two new combination methods, achieving higher savings across five real-world databases compared to state-of-the-art techniques.

Several real-world classification problems are example-dependent cost-sensitive in nature, where the costs due to misclassification vary between examples and not only within classes. However, standard classification methods do not take these costs into account, and assume a constant cost of misclassification errors. In previous works, some methods that take into account the financial costs into the training of different algorithms have been proposed, with the example-dependent cost-sensitive decision tree algorithm being the one that gives the highest savings. In this paper we propose a new framework of ensembles of example-dependent cost-sensitive decision-trees. The framework consists in creating different example-dependent cost-sensitive decision trees on random subsamples of the training set, and then combining them using three different combination approaches. Moreover, we propose two new cost-sensitive combination approaches; cost-sensitive weighted voting and cost-sensitive stacking, the latter being based on the cost-sensitive logistic regression method. Finally, using five different databases, from four real-world applications: credit card fraud detection, churn modeling, credit scoring and direct marketing, we evaluate the proposed method against state-of-the-art example-dependent cost-sensitive techniques, namely, cost-proportionate sampling, Bayes minimum risk and cost-sensitive decision trees. The results show that the proposed algorithms have better results for all databases, in the sense of higher savings.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes