CLSep 4, 2024

Oddballness: universal anomaly detection with language models

Filip Graliński, Ryszard Staruch, Krzysztof Jurkiewicz

arXiv:2409.03046v112.220 citationsh-index: 3

Originality Incremental advance

AI Analysis

This addresses the problem of detecting anomalies in text and other sequences for applications like error detection, though it appears incremental as it builds on existing language model approaches.

The paper tackles unsupervised anomaly detection in sequences by introducing 'oddballness', a new metric that measures token strangeness using language model probabilities, and demonstrates its superiority over low-likelihood methods in grammatical error detection tasks.

We present a new method to detect anomalies in texts (in general: in sequences of any data), using language models, in a totally unsupervised manner. The method considers probabilities (likelihoods) generated by a language model, but instead of focusing on low-likelihood tokens, it considers a new metric introduced in this paper: oddballness. Oddballness measures how ``strange'' a given token is according to the language model. We demonstrate in grammatical error detection tasks (a specific case of text anomaly detection) that oddballness is better than just considering low-likelihood events, if a totally unsupervised setup is assumed.

View on arXiv PDF

Similar