LGAug 27, 2025

Delta-Audit: Explaining What Changes When Models Change

arXiv:2508.19589v11 citationsh-index: 2ICCKE
Originality Incremental advance
AI Analysis

This provides a tool for practitioners to audit model updates, though it is incremental as it builds on existing attribution methods.

The paper tackles the problem of explaining why model performance changes when models are updated, by introducing Delta-Attribution, a model-agnostic framework that differences per-feature attributions to identify what changed between versions, and finds it effectively distinguishes meaningful changes from cosmetic tweaks, with metrics like BAC≈0.998 and DCE≈6.6 for inductive-bias changes.

Model updates (new hyperparameters, kernels, depths, solvers, or data) change performance, but the \emph{reason} often remains opaque. We introduce \textbf{Delta-Attribution} (\mbox{$Δ$-Attribution}), a model-agnostic framework that explains \emph{what changed} between versions $A$ and $B$ by differencing per-feature attributions: $Δφ(x)=φ_B(x)-φ_A(x)$. We evaluate $Δφ$ with a \emph{$Δ$-Attribution Quality Suite} covering magnitude/sparsity (L1, Top-$k$, entropy), agreement/shift (rank-overlap@10, Jensen--Shannon divergence), behavioural alignment (Delta Conservation Error, DCE; Behaviour--Attribution Coupling, BAC; CO$Δ$F), and robustness (noise, baseline sensitivity, grouped occlusion). Instantiated via fast occlusion/clamping in standardized space with a class-anchored margin and baseline averaging, we audit 45 settings: five classical families (Logistic Regression, SVC, Random Forests, Gradient Boosting, $k$NN), three datasets (Breast Cancer, Wine, Digits), and three A/B pairs per family. \textbf{Findings.} Inductive-bias changes yield large, behaviour-aligned deltas (e.g., SVC poly$\!\rightarrow$rbf on Breast Cancer: BAC$\approx$0.998, DCE$\approx$6.6; Random Forest feature-rule swap on Digits: BAC$\approx$0.997, DCE$\approx$7.5), while ``cosmetic'' tweaks (SVC \texttt{gamma=scale} vs.\ \texttt{auto}, $k$NN search) show rank-overlap@10$=1.0$ and DCE$\approx$0. The largest redistribution appears for deeper GB on Breast Cancer (JSD$\approx$0.357). $Δ$-Attribution offers a lightweight update audit that complements accuracy by distinguishing benign changes from behaviourally meaningful or risky reliance shifts.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes