LG DB MLDec 1, 2016

A New Method for Classification of Datasets for Data Mining

Singh Vijendra, Hemjyotsana Parashar, Nisha Vasudeva

arXiv:1612.00151v11 citations

Originality Synthesis-oriented

AI Analysis

This work offers an incremental improvement for data mining practitioners using decision trees, potentially enhancing classification performance in specific applications.

The paper addresses the tendency of the ID3 decision tree algorithm to favor attributes with many values by proposing an improved version that groups attributes and applies a selection measure iteratively until achieving a good classification/misclassification ratio, resulting in more accurate and efficient classification of datasets.

Decision tree is an important method for both induction research and data mining, which is mainly used for model classification and prediction. ID3 algorithm is the most widely used algorithm in the decision tree so far. In this paper, the shortcoming of ID3's inclining to choose attributes with many values is discussed, and then a new decision tree algorithm which is improved version of ID3. In our proposed algorithm attributes are divided into groups and then we apply the selection measure 5 for these groups. If information gain is not good then again divide attributes values into groups. These steps are done until we get good classification/misclassification ratio. The proposed algorithms classify the data sets more accurately and efficiently.

View on arXiv PDF

Similar