LGSep 30, 2025

The Loss Kernel: A Geometric Probe for Deep Learning Interpretability

arXiv:2509.26537v17 citationsh-index: 3
Originality Incremental advance
AI Analysis

This provides a practical interpretability tool for deep learning researchers, though it is incremental as it builds on existing perturbation-based methods.

The paper tackled the problem of interpreting neural networks by introducing the loss kernel, a method to measure data similarity based on loss covariance under parameter perturbations, and demonstrated its effectiveness by separating tasks in a synthetic problem and aligning with the WordNet hierarchy in ImageNet.

We introduce the loss kernel, an interpretability method for measuring similarity between data points according to a trained neural network. The kernel is the covariance matrix of per-sample losses computed under a distribution of low-loss-preserving parameter perturbations. We first validate our method on a synthetic multitask problem, showing it separates inputs by task as predicted by theory. We then apply this kernel to Inception-v1 to visualize the structure of ImageNet, and we show that the kernel's structure aligns with the WordNet semantic hierarchy. This establishes the loss kernel as a practical tool for interpretability and data attribution.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes