Susceptibilities and Patterning: A Primer on Linear Response in Bayesian Learning

arXiv:2605.0798065.7

AI Analysis

For researchers in Bayesian deep learning and interpretability, this provides a theoretical framework linking statistical mechanics to neural network analysis, but it is an expository primer rather than a novel contribution.

This primer introduces the theory of susceptibilities for interpreting neural networks, defining them as derivatives of posterior expectations that equal posterior covariances via the fluctuation-dissipation theorem. It shows how different observables yield the influence matrix and structural susceptibility matrix, which linearizes the patterning problem of finding data perturbations for desired structural changes.

These notes introduce the theory of susceptibilities as developed in [arXiv:2504.18274, arXiv:2601.12703] for interpreting neural networks. The susceptibility of an observable $ϕ$ to a data perturbation is defined as a derivative of a posterior expectation, which by the fluctuation--dissipation theorem equals a posterior covariance. Different choices of $ϕ$ yield different objects: per-sample losses give the influence matrix (the Bayesian influence function of [arXiv:2509.26544]), while component-localized observables give the structural susceptibility matrix that pairs model components with data patterns. The susceptibility matrix is (up to a factor of $nβ$) the Jacobian of the map from data distributions to structural coordinates; its pseudo-inverse provides a linearized solution to the patterning problem of [arXiv:2601.13548]: finding data perturbations that produce a desired structural change. We motivate the theory from its statistical-mechanical foundations, then give a detailed exposition of susceptibilities, their empirical estimators, and their connection to the geometry of the loss landscape.

View on arXiv PDF

Similar