LG DBOct 20, 2016

Multilevel Anomaly Detection for Mixed Data

arXiv:1610.06249v1

Originality Incremental advance

AI Analysis

This addresses anomaly detection for domains with mixed data types, but it is incremental as it builds on existing ensemble and deep learning techniques.

The paper tackled the problem of unsupervised anomaly detection in high-dimensional mixed data by proposing MIXMAD, an ensemble method using multilevel abstractions, and demonstrated its superiority over existing methods on real-world datasets.

Anomalies are those deviating from the norm. Unsupervised anomaly detection often translates to identifying low density regions. Major problems arise when data is high-dimensional and mixed of discrete and continuous attributes. We propose MIXMAD, which stands for MIXed data Multilevel Anomaly Detection, an ensemble method that estimates the sparse regions across multiple levels of abstraction of mixed data. The hypothesis is for domains where multiple data abstractions exist, a data point may be anomalous with respect to the raw representation or more abstract representations. To this end, our method sequentially constructs an ensemble of Deep Belief Nets (DBNs) with varying depths. Each DBN is an energy-based detector at a predefined abstraction level. At the bottom level of each DBN, there is a Mixed-variate Restricted Boltzmann Machine that models the density of mixed data. Predictions across the ensemble are finally combined via rank aggregation. The proposed MIXMAD is evaluated on high-dimensional realworld datasets of different characteristics. The results demonstrate that for anomaly detection, (a) multilevel abstraction of high-dimensional and mixed data is a sensible strategy, and (b) empirically, MIXMAD is superior to popular unsupervised detection methods for both homogeneous and mixed data.

View on arXiv PDF

Similar