Updating Formulas and Algorithms for Computing Entropy and Gini Index from Time-Changing Data Streams
This work addresses a computational bottleneck for researchers and practitioners in data stream mining, though it appears incremental in nature.
The paper tackled the problem of inefficient periodic recomputation for updating entropy and Gini index in data stream mining by providing simple incremental formulas and algorithms, resulting in more efficient updates without specifying concrete performance numbers.
Despite growing interest in data stream mining the most successful incremental learners, such as VFDT, still use periodic recomputation to update attribute information gains and Gini indices. This note provides simple incremental formulas and algorithms for computing entropy and Gini index from time-changing data streams.