MLLGSep 20, 2023

Distribution and volume based scoring for Isolation Forests

arXiv:2309.11450v13 citationsh-index: 2
Originality Incremental advance
AI Analysis

This work addresses incremental improvements in anomaly detection methods for data science applications.

The authors tackled the problem of improving anomaly detection in Isolation Forests by proposing two new scoring functions: one that considers the distribution of scores across trees and another based on hyper-volumes in leaf nodes. They evaluated these on 34 datasets from ADBench, finding significant improvements on some datasets and average gains for one variant.

We make two contributions to the Isolation Forest method for anomaly and outlier detection. The first contribution is an information-theoretically motivated generalisation of the score function that is used to aggregate the scores across random tree estimators. This generalisation allows one to take into account not just the ensemble average across trees but instead the whole distribution. The second contribution is an alternative scoring function at the level of the individual tree estimator, in which we replace the depth-based scoring of the Isolation Forest with one based on hyper-volumes associated to an isolation tree's leaf nodes. We motivate the use of both of these methods on generated data and also evaluate them on 34 datasets from the recent and exhaustive ``ADBench'' benchmark, finding significant improvement over the standard isolation forest for both variants on some datasets and improvement on average across all datasets for one of the two variants. The code to reproduce our results is made available as part of the submission.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes