LG MLJun 13, 2025

Fidelity Isn't Accuracy: When Linearly Decodable Functions Fail to Match the Ground Truth

arXiv:2506.12176v3h-index: 1Has Code

Originality Incremental advance

AI Analysis

This work addresses the problem of interpretability in neural networks for researchers and practitioners, highlighting risks in using surrogate fidelity as a proxy for understanding, but it is incremental as it builds on existing diagnostic methods.

The paper introduces the linearity score λ(f) to measure how well a neural network's predictions can be mimicked by a linear model, finding that high scores indicate alignment with the network's outputs but do not guarantee accuracy with respect to the ground truth.

Neural networks excel as function approximators, but their complexity often obscures the types of functions they learn, making it difficult to explain their behavior. To address this, the linearity score $λ(f)$ is introduced, a simple and interpretable diagnostic that quantifies how well a regression network's output can be mimicked by a linear model. Defined as the $R^2$ value between the network's predictions and those of a trained linear surrogate, $λ(f)$ measures linear decodability: the extent to which the network's behavior aligns with a structurally simple model. This framework is evaluated on both synthetic and real-world datasets, using dataset-specific networks and surrogates. High $λ(f)$ scores reliably indicate alignment with the network's outputs; however, they do not guarantee accuracy with respect to the ground truth. These results highlight the risk of using surrogate fidelity as a proxy for model understanding, especially in high-stakes regression tasks.

View on arXiv PDF Code

Similar