LGMLJul 19, 2025

Glitches in Decision Tree Ensemble Models

arXiv:2507.14492v1
Originality Highly original
AI Analysis

This addresses reliability issues in AI models for critical decision-making tasks, though it is incremental in focusing on a specific source of inconsistency.

The paper identifies 'glitches' as small input neighborhoods causing abrupt output oscillations in models with steep decision boundaries, demonstrating their widespread existence in gradient-boosted decision tree models and proving that detecting them is NP-complete for trees of depth 4.

Many critical decision-making tasks are now delegated to machine-learned models, and it is imperative that their decisions are trustworthy and reliable, and their outputs are consistent across similar inputs. We identify a new source of unreliable behaviors-called glitches-which may significantly impair the reliability of AI models having steep decision boundaries. Roughly speaking, glitches are small neighborhoods in the input space where the model's output abruptly oscillates with respect to small changes in the input. We provide a formal definition of glitches, and use well-known models and datasets from the literature to demonstrate that they have widespread existence and argue they usually indicate potential model inconsistencies in the neighborhood of where they are found. We proceed to the algorithmic search of glitches for widely used gradient-boosted decision tree (GBDT) models. We prove that the problem of detecting glitches is NP-complete for tree ensembles, already for trees of depth 4. Our glitch-search algorithm for GBDT models uses an MILP encoding of the problem, and its effectiveness and computational feasibility are demonstrated on a set of widely used GBDT benchmarks taken from the literature.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes