LGJul 29, 2015

Learning Representations for Outlier Detection on a Budget

Barbora Micenková, Brian McWilliams, Ira Assent

arXiv:1507.08104v15.428 citations

Originality Incremental advance

AI Analysis

This work addresses outlier detection for applications like fraud detection and physics, offering an incremental improvement by integrating existing approaches to balance performance and cost.

The paper tackles the problem of detecting outliers in large datasets by proposing BORE, a method that combines unsupervised outlier scoring functions with supervised learning to handle class imbalance and computational constraints, demonstrating good performance on 12 real-world datasets.

The problem of detecting a small number of outliers in a large dataset is an important task in many fields from fraud detection to high-energy physics. Two approaches have emerged to tackle this problem: unsupervised and supervised. Supervised approaches require a sufficient amount of labeled data and are challenged by novel types of outliers and inherent class imbalance, whereas unsupervised methods do not take advantage of available labeled training examples and often exhibit poorer predictive performance. We propose BORE (a Bagged Outlier Representation Ensemble) which uses unsupervised outlier scoring functions (OSFs) as features in a supervised learning framework. BORE is able to adapt to arbitrary OSF feature representations, to the imbalance in labeled data as well as to prediction-time constraints on computational cost. We demonstrate the good performance of BORE compared to a variety of competing methods in the non-budgeted and the budgeted outlier detection problem on 12 real-world datasets.

View on arXiv PDF

Similar