5.1LGMay 15
AGOP-IxG: A Gradient Covariance Filter for Local Feature Attribution on Tabular Data, with a Controlled BenchmarkRaj Kiran Gupta Katakam
Automated machine learning pipelines increasingly produce models whose predictions must be explained to end users, auditors, and downstream decision systems. The most widely used feature attribution methods (SHAP, Integrated Gradients, LIME) are typically chosen by convention rather than measured fidelity, because rigorous evaluation is impeded by the absence of ground-truth attribution on real data. We propose AGOP-IxG, a fast per-sample attribution method for tabular classifiers that pre-multiplies the per-sample gradient by a top-$K$ rank-truncated Average Gradient Outer Product matrix, and evaluate it against four widely-used baselines on a controlled tabular benchmark designed for AutoML practitioners. In Part 1, we construct three synthetic multi-class tabular tasks (linear, sparse nonlinear, interaction-based) where ground-truth attribution per sample is analytically or numerically derivable, and compare five methods: AGOP-IxG, SHAP (DeepExplainer), Integrated Gradients, InputXGradient, and LIME. AGOP-IxG leads on Spearman rank correlation and noise feature mass on all three synthetic datasets, and on top-$k$ precision on the interaction dataset. Across all settings, AGOP-IxG is approximately $350\times$ to $1{,}650\times$ faster than SHAP. In Part 2, we evaluate global faithfulness on Adult Income and Credit Card Default using the ROAR protocol; the methods cluster within $\sim 1.7\%$ relative AUC, consistent with AGOP-IxG being optimized for per-sample local attribution rather than global feature ranking.
3.7LGMay 12
AGOP as Explanation: From Feature Learning to Per-Sample Attribution in Image ClassifiersRaj Kiran Gupta Katakam
The Average Gradient Outer Product (AGOP) governs feature learning in neural networks: the Neural Feature Ansatz states that weight Gram matrices at each layer align with the corresponding AGOP matrices computed over the training distribution. We ask a complementary question: can this same quantity serve as a post-hoc attribution method for explaining individual predictions? We introduce AGOP-Weighted: a novel attribution method that multiplies the per-sample gradient by sqrt(diag(M) / max diag(M)), a training-distribution prior that suppresses gradient noise and amplifies consistently important pixels -- a combination not present in any prior attribution method. We formalise two companion variants -- AGOP-Local (per-sample gradient, equivalent to VanillaGrad) and AGOP-Global (diag(M) directly as a zero-cost saliency map) -- and implement an efficient training-time accumulation hook; AGOP-Global then requires zero inference cost (disk lookup) while AGOP-Weighted requires only a single gradient pass. We conduct the first rigorous comparison of AGOP attribution against Integrated Gradients (IG), SmoothGrad, GradCAM, and VanillaGrad across two benchmarks with pixel-level ground truth: (i) the synthetic XAI-TRIS benchmark (four classification scenarios, 8x8 images, CNN8by8) and (ii) the photorealistic CLEVR-XAI benchmark (ResNet-18 fine-tuned from ImageNet). AGOP-Weighted achieves 44% higher mIoU than IG on linear tasks; AGOP-Global achieves 7x higher mIoU than IG on multiplicative tasks (where IG falls below random) at zero inference cost. Both findings generalise to ResNet-18 on CLEVR-XAI (+18% and +37% respectively). We further show that GradCAM fails on small-resolution images due to spatial resolution collapse, and that diag(M) quality improves monotonically throughout training even after classification accuracy has plateaued.