CL LGJul 15, 2024

Employing Sentence Space Embedding for Classification of Data Stream from Fake News Domain

Paweł Zyblewski, Jakub Klikowski, Weronika Borek-Marciniec, Paweł Ksieniewicz

arXiv:2407.10807v21.0h-index: 10Has Code

Originality Incremental advance

AI Analysis

This addresses fake news detection by introducing a novel method for text data stream classification, though it appears incremental as it adapts existing techniques to a new domain.

The paper tackles fake news detection by applying sentence space embedding to convert text into discrete signals, enabling convolutional networks designed for images to classify natural language data streams. On the Fakeddit dataset, the method was evaluated against state-of-the-art algorithms for generalization ability and time complexity.

Tabular data is considered the last unconquered castle of deep learning, yet the task of data stream classification is stated to be an equally important and demanding research area. Due to the temporal constraints, it is assumed that deep learning methods are not the optimal solution for application in this field. However, excluding the entire -- and prevalent -- group of methods seems rather rash given the progress that has been made in recent years in its development. For this reason, the following paper is the first to present an approach to natural language data stream classification using the sentence space method, which allows for encoding text into the form of a discrete digital signal. This allows the use of convolutional deep networks dedicated to image classification to solve the task of recognizing fake news based on text data. Based on the real-life Fakeddit dataset, the proposed approach was compared with state-of-the-art algorithms for data stream classification based on generalization ability and time complexity.

View on arXiv PDF Code

Similar