TabADM: Unsupervised Tabular Anomaly Detection with Diffusion Models
This work addresses the problem of identifying anomalies in contaminated tabular datasets for researchers and practitioners, representing an incremental improvement with a novel method for a known bottleneck.
The paper tackled unsupervised anomaly detection in tabular data by proposing a diffusion-based probabilistic model that learns the density of normal samples using a rejection scheme to reduce anomaly influence, and demonstrated improved detection capabilities over baselines on real data.
Tables are an abundant form of data with use cases across all scientific fields. Real-world datasets often contain anomalous samples that can negatively affect downstream analysis. In this work, we only assume access to contaminated data and present a diffusion-based probabilistic model effective for unsupervised anomaly detection. Our model is trained to learn the density of normal samples by utilizing a unique rejection scheme to attenuate the influence of anomalies on the density estimation. At inference, we identify anomalies as samples in low-density regions. We use real data to demonstrate that our method improves detection capabilities over baselines. Furthermore, our method is relatively stable to the dimension of the data and does not require extensive hyperparameter tuning.