Theoretical Investigation on Inductive Bias of Isolation Forest
This provides a theoretical understanding for researchers and practitioners using iForest, though it is incremental as it builds on existing methods without introducing new paradigms.
The paper tackles the lack of theoretical foundation for Isolation Forest (iForest) by investigating its inductive bias, deriving an expected depth function through a random walk model to explain when and how iForest performs well, with case studies showing it has lower sensitivity to central anomalies and greater parameter adaptability compared to k-Nearest Neighbor.
Isolation Forest (iForest) stands out as a widely-used unsupervised anomaly detector, primarily owing to its remarkable runtime efficiency and superior performance in large-scale tasks. Despite its widespread adoption, a theoretical foundation explaining iForest's success remains unclear. This paper focuses on the inductive bias of iForest, which theoretically elucidates under what circumstances and to what extent iForest works well. The key is to formulate the growth process of iForest, where the split dimensions and split values are randomly selected. We model the growth process of iForest as a random walk, enabling us to derive the expected depth function, which is the outcome of iForest, using transition probabilities. The case studies reveal key inductive biases: iForest exhibits lower sensitivity to central anomalies while demonstrating greater parameter adaptability compared to $k$-Nearest Neighbor. Our study provides a theoretical understanding of the effectiveness of iForest and establishes a foundation for further theoretical exploration.