LGCVMLJan 29, 2020

stream-learn -- open-source Python library for difficult data stream batch analysis

arXiv:2001.11077v136 citations
AI Analysis

This provides a tool for researchers and practitioners dealing with drifting and imbalanced data streams, but it is incremental as it builds on existing methodologies and implementations.

The authors introduced stream-learn, an open-source Python library for analyzing data streams with concept drift and class imbalance, featuring a stream generator for synthetic data and implementing established evaluation methodologies and efficient classifiers.

stream-learn is a Python package compatible with scikit-learn and developed for the drifting and imbalanced data stream analysis. Its main component is a stream generator, which allows to produce a synthetic data stream that may incorporate each of the three main concept drift types (i.e. sudden, gradual and incremental drift) in their recurring or non-recurring versions. The package allows conducting experiments following established evaluation methodologies (i.e. Test-Then-Train and Prequential). In addition, estimators adapted for data stream classification have been implemented, including both simple classifiers and state-of-art chunk-based and online classifier ensembles. To improve computational efficiency, package utilises its own implementations of prediction metrics for imbalanced binary classification tasks.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes