LGDec 22, 2020

Probabilistic Outlier Detection and Generation

Stefano Giovanni Rizzo, Linsey Pang, Yixian Chen, Sanjay Chawla

arXiv:2012.12394v11.2

Originality Highly original

AI Analysis

This work addresses the problem of simultaneously detecting and generating outliers, which is important for various applications like fraud detection and cybersecurity, offering a novel approach for practitioners in these fields.

This paper introduces WALDO, a Wasserstein double autoencoder, for detecting and generating outliers by mapping data into a space of probability distributions. The method is evaluated on MNIST, CIFAR10, and KDD99 datasets for detection accuracy and robustness, and demonstrated on retail sales data and for simulating intrusion attacks.

A new method for outlier detection and generation is introduced by lifting data into the space of probability distributions which are not analytically expressible, but from which samples can be drawn using a neural generator. Given a mixture of unknown latent inlier and outlier distributions, a Wasserstein double autoencoder is used to both detect and generate inliers and outliers. The proposed method, named WALDO (Wasserstein Autoencoder for Learning the Distribution of Outliers), is evaluated on classical data sets including MNIST, CIFAR10 and KDD99 for detection accuracy and robustness. We give an example of outlier detection on a real retail sales data set and an example of outlier generation for simulating intrusion attacks. However we foresee many application scenarios where WALDO can be used. To the best of our knowledge this is the first work that studies both outlier detection and generation together.

View on arXiv PDF

Similar