stream-learn -- open-source Python library for difficult data stream batch analysis
This provides a tool for researchers and practitioners dealing with drifting and imbalanced data streams, but it is incremental as it builds on existing methodologies and implementations.
The authors introduced stream-learn, an open-source Python library for analyzing data streams with concept drift and class imbalance, featuring a stream generator for synthetic data and implementing established evaluation methodologies and efficient classifiers.
stream-learn is a Python package compatible with scikit-learn and developed for the drifting and imbalanced data stream analysis. Its main component is a stream generator, which allows to produce a synthetic data stream that may incorporate each of the three main concept drift types (i.e. sudden, gradual and incremental drift) in their recurring or non-recurring versions. The package allows conducting experiments following established evaluation methodologies (i.e. Test-Then-Train and Prequential). In addition, estimators adapted for data stream classification have been implemented, including both simple classifiers and state-of-art chunk-based and online classifier ensembles. To improve computational efficiency, package utilises its own implementations of prediction metrics for imbalanced binary classification tasks.