Extracting useful rules through improved decision tree induction using information entropy
This work addresses incremental improvements in decision tree induction for data mining applications, potentially benefiting users dealing with large-scale classification tasks.
The authors tackled scalability and efficiency issues in classification for large databases by improving the C4.5 decision tree algorithm, incorporating attribute-oriented induction and relevance analysis with concept hierarchies, and reported results compared to C4.5 on an education dataset.
Classification is widely used technique in the data mining domain, where scalability and efficiency are the immediate problems in classification algorithms for large databases. We suggest improvements to the existing C4.5 decision tree algorithm. In this paper attribute oriented induction (AOI) and relevance analysis are incorporated with concept hierarchys knowledge and HeightBalancePriority algorithm for construction of decision tree along with Multi level mining. The assignment of priorities to attributes is done by evaluating information entropy, at different levels of abstraction for building decision tree using HeightBalancePriority algorithm. Modified DMQL queries are used to understand and explore the shortcomings of the decision trees generated by C4.5 classifier for education dataset and the results are compared with the proposed approach.