LGDec 15, 2022

Forgetful Forests: high performance learning data structures for streaming data under concept drift

arXiv:2212.07876v11.8h-index: 8

Originality Incremental advance

AI Analysis

This provides a high-performance solution for handling concept drift in high-volume streaming applications, representing a strong specific gain rather than a broad paradigm shift.

The paper tackled the problem of concept drift in streaming data by designing forgetful tree-based learning algorithms, achieving up to 24 times faster performance than state-of-the-art incremental algorithms with at most a 2% loss in accuracy.

Database research can help machine learning performance in many ways. One way is to design better data structures. This paper combines the use of incremental computation and sequential and probabilistic filtering to enable "forgetful" tree-based learning algorithms to cope with concept drift data (i.e., data whose function from input to classification changes over time). The forgetful algorithms described in this paper achieve high time performance while maintaining high quality predictions on streaming data. Specifically, the algorithms are up to 24 times faster than state-of-the-art incremental algorithms with at most a 2% loss of accuracy, or at least twice faster without any loss of accuracy. This makes such structures suitable for high volume streaming applications.

View on arXiv PDF

Similar