ML LG APOct 9, 2023

Enhancing Interpretability and Generalizability in Extended Isolation Forests

Alessio Arcudi, Davide Frizzo, Chiara Masiero, Gian Antonio Susto

arXiv:2310.05468v38.610 citationsh-index: 27Has Code

Originality Incremental advance

AI Analysis

It addresses the need for clear explanations and better generalization in anomaly detection for engineering diagnostics and maintenance, though it is incremental as it builds on existing Extended Isolation Forest methods.

This paper tackles the challenge of interpretability and generalization in anomaly detection by introducing ExIFFI, a method that explains predictions from Extended Isolation Forest models, and EIF+, an improved version for better detection of unseen anomalies. Results show EIF+ outperforms EIF across all datasets in generalization, and ExIFFI outperforms other methods on 8 of 11 real-world datasets in interpretability.

Anomaly Detection (AD) focuses on identifying unusual behaviors in complex datasets. Machine Learning (ML) algorithms and Decision Support Systems (DSSs) provide effective solutions for AD, but detecting anomalies alone may not be enough, especially in engineering, where diagnostics and maintenance are crucial. Users need clear explanations to support root cause analysis and build trust in the model. The unsupervised nature of AD, however, makes interpretability a challenge. This paper introduces Extended Isolation Forest Feature Importance (ExIFFI), a method that explains predictions made by Extended Isolation Forest (EIF) models, which split data using hyperplanes. ExIFFI provides explanations at both global and local levels by leveraging feature importance. We also present an improved version, Enhanced Extended Isolation Forest (EIF+), designed to enhance the model's ability to detect unseen anomalies through a revised splitting strategy. Using five synthetic and eleven real-world datasets, we conduct a comparative analysis, evaluating unsupervised AD methods with the Average Precision metric. EIF+ consistently outperforms EIF across all datasets when trained without anomalies, demonstrating better generalization. To assess ExIFFI's interpretability, we introduce the Area Under the Curve of Feature Selection (AUC\_FS), a novel metric using feature selection as a proxy task. ExIFFI outperforms other unsupervised interpretability methods on 8 of 11 real-world datasets and successfully identifies anomalous features in synthetic datasets. When trained only on inliers, ExIFFI also outperforms competing models on real-world data and accurately detects anomalous features in synthetic datasets. We provide open-source code to encourage further research and reproducibility.

View on arXiv PDF Code

Similar