On the challenges to learn from Natural Data Streams
This work addresses the challenge of learning from real-world data streams for machine learning practitioners, but it is incremental as it applies existing methods to a new data organization.
The paper investigates the classification performance of various continual, streaming, and online learning algorithms when trained on Natural Data Streams, which feature streaming data, unbalanced distributions, data drift, and sample correlations, using three datasets designed to replicate this setting.
In real-world contexts, sometimes data are available in form of Natural Data Streams, i.e. data characterized by a streaming nature, unbalanced distribution, data drift over a long time frame and strong correlation of samples in short time ranges. Moreover, a clear separation between the traditional training and deployment phases is usually lacking. This data organization and fruition represents an interesting and challenging scenario for both traditional Machine and Deep Learning algorithms and incremental learning agents, i.e. agents that have the ability to incrementally improve their knowledge through the past experience. In this paper, we investigate the classification performance of a variety of algorithms that belong to various research field, i.e. Continual, Streaming and Online Learning, that receives as training input Natural Data Streams. The experimental validation is carried out on three different datasets, expressly organized to replicate this challenging setting.