CLNov 19, 2023

Spot the Bot: Distinguishing Human-Written and Bot-Generated Texts Using Clustering and Information Theory Techniques

arXiv:2311.11441v12 citationsh-index: 3
Originality Incremental advance
AI Analysis

This addresses the challenge of bot identification in text generation for applications like content moderation, offering an unsupervised approach that is incremental compared to existing supervised methods.

The paper tackles the problem of distinguishing bot-generated from human-written texts by proposing an unsupervised algorithm using clustering and information theory, achieving robust detection across different bot types without requiring labeled data.

With the development of generative models like GPT-3, it is increasingly more challenging to differentiate generated texts from human-written ones. There is a large number of studies that have demonstrated good results in bot identification. However, the majority of such works depend on supervised learning methods that require labelled data and/or prior knowledge about the bot-model architecture. In this work, we propose a bot identification algorithm that is based on unsupervised learning techniques and does not depend on a large amount of labelled data. By combining findings in semantic analysis by clustering (crisp and fuzzy) and information techniques, we construct a robust model that detects a generated text for different types of bot. We find that the generated texts tend to be more chaotic while literary works are more complex. We also demonstrate that the clustering of human texts results in fuzzier clusters in comparison to the more compact and well-separated clusters of bot-generated texts.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes