An Ensemble Classification Algorithm Based on Information Entropy for Data Streams
This is an incremental improvement for data stream mining, addressing concept drift detection with a new weighting method based on entropy.
The paper tackles concept drift in data streams by proposing an ensemble classification algorithm that uses information entropy to evaluate results and adjust classifier weights, achieving better classification accuracy and time performance than four contrastive algorithms on six databases.
Data stream mining problem has caused widely concerns in the area of machine learning and data mining. In some recent studies, ensemble classification has been widely used in concept drift detection, however, most of them regard classification accuracy as a criterion for judging whether concept drift happening or not. Information entropy is an important and effective method for measuring uncertainty. Based on the information entropy theory, a new algorithm using information entropy to evaluate a classification result is developed. It uses ensemble classification techniques, and the weight of each classifier is decided through the entropy of the result produced by an ensemble classifiers system. When the concept in data streams changing, the classifiers' weight below a threshold value will be abandoned to adapt to a new concept in one time. In the experimental analysis section, six databases and four proposed algorithms are executed. The results show that the proposed method can not only handle concept drift effectively, but also have a better classification accuracy and time performance than the contrastive algorithms.