LGApr 24, 2024

Employing Two-Dimensional Word Embedding for Difficult Tabular Data Stream Classification

arXiv:2404.15836v12 citationsh-index: 10ECML/PKDD
Originality Incremental advance
AI Analysis

This addresses the challenge of handling concept drift and imbalance in data streams for applications like real-time monitoring, though it is incremental as it adapts existing encoding and neural network techniques to a new task.

The paper tackled the problem of classifying difficult data streams with concept drift and high imbalance by proposing SSTML, which uses Multi-Dimensional Encoding to transform tabular data into images and trains a ResNet-18 model, achieving statistically significant superior classification quality compared to state-of-the-art algorithms while maintaining comparable processing time.

Rapid technological advances are inherently linked to the increased amount of data, a substantial portion of which can be interpreted as data stream, capable of exhibiting the phenomenon of concept drift and having a high imbalance ratio. Consequently, developing new approaches to classifying difficult data streams is a rapidly growing research area. At the same time, the proliferation of deep learning and transfer learning, as well as the success of convolutional neural networks in computer vision tasks, have contributed to the emergence of a new research trend, namely Multi-Dimensional Encoding (MDE), focusing on transforming tabular data into a homogeneous form of a discrete digital signal. This paper proposes Streaming Super Tabular Machine Learning (SSTML), thereby exploring for the first time the potential of MDE in the difficult data stream classification task. SSTML encodes consecutive data chunks into an image representation using the STML algorithm and then performs a single ResNet-18 training epoch. Experiments conducted on synthetic and real data streams have demonstrated the ability of SSTML to achieve classification quality statistically significantly superior to state-of-the-art algorithms while maintaining comparable processing time.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes