AIDBSep 24, 2015

CRDT: Correlation Ratio Based Decision Tree Model for Healthcare Data Mining

arXiv:1509.07266v116 citations
Originality Incremental advance
AI Analysis

This work addresses a specific issue in healthcare data mining by reducing bias in attribute selection for decision trees, but it is incremental as it modifies an existing method.

The authors tackled the problem of Information Gain-based decision trees performing poorly on healthcare datasets with many distinct attribute values by proposing a Correlation Ratio-based decision tree variant, which they demonstrated to be effective on benchmark healthcare datasets.

The phenomenal growth in the healthcare data has inspired us in investigating robust and scalable models for data mining. For classification problems Information Gain(IG) based Decision Tree is one of the popular choices. However, depending upon the nature of the dataset, IG based Decision Tree may not always perform well as it prefers the attribute with more number of distinct values as the splitting attribute. Healthcare datasets generally have many attributes and each attribute generally has many distinct values. In this paper, we have tried to focus on this characteristics of the datasets while analysing the performance of our proposed approach which is a variant of Decision Tree model and uses the concept of Correlation Ratio(CR). Unlike IG based approach, this CR based approach has no biasness towards the attribute with more number of distinct values. We have applied our model on some benchmark healthcare datasets to show the effectiveness of the proposed technique.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes