Measuring AI "Slop" in Text
This work addresses the lack of standardized metrics for evaluating AI-generated text quality, which is a problem for researchers and practitioners in NLP and AI ethics, though it is incremental in establishing a framework rather than a breakthrough.
The authors tackled the problem of defining and measuring low-quality AI-generated text, known as 'slop', by developing a taxonomy and interpretable dimensions for assessment, finding that binary judgments correlate with factors like coherence and relevance.
AI "slop" is an increasingly popular term used to describe low-quality AI-generated text, but there is currently no agreed upon definition of this term nor a means to measure its occurrence. In this work, we develop a taxonomy of "slop" through interviews with experts in NLP, writing, and philosophy, and propose a set of interpretable dimensions for its assessment in text. Through span-level annotation, we find that binary "slop" judgments are (somewhat) subjective, but such determinations nonetheless correlate with latent dimensions such as coherence and relevance. Our framework can be used to evaluate AI-generated text in both detection and binary preference tasks, potentially offering new insights into the linguistic and stylistic factors that contribute to quality judgments.