LGJul 9, 2019
Contextual One-Class Classification in Data StreamsRichard Hugh Moulton, Herna L. Viktor, Nathalie Japkowicz et al.
In machine learning, the one-class classification problem occurs when training instances are only available from one class. It has been observed that making use of this class's structure, or its different contexts, may improve one-class classifier performance. Although this observation has been demonstrated for static data, a rigorous application of the idea within the data stream environment is lacking. To address this gap, we propose the use of context to guide one-class classifier learning in data streams, paying particular attention to the challenges presented by the dynamic learning environment. We present three frameworks that learn contexts and conduct experiments with synthetic and benchmark data streams. We conclude that the paradigm of contexts in data streams can be used to improve the performance of streaming one-class classifiers.
LGJan 30, 2019
The Wilderness Area Data Set: Adapting the Covertype data set for unsupervised learningRichard Hugh Moulton, Jakub Zgraja
Benchmark data sets are of vital importance in machine learning research, as indicated by the number of repositories that exist to make them publicly available. Although many of these are usable in the stream mining context as well, it is less obvious which data sets can be used to evaluate data stream clustering algorithms. We note that the classic Covertype data set's size makes it attractive for use in stream mining but unfortunately it is specifically designed for classification. Here we detail the process of transforming the Covertype data set into one amenable for unsupervised learning, which we call the Wilderness Area data set. Our quantitative analysis allows us to conclude that the Wilderness Area data set is more appropriate for unsupervised learning than the original Covertype data set.