LGAIMay 19, 2025

Understanding Task Representations in Neural Networks via Bayesian Ablation

arXiv:2505.13742v1h-index: 2CogSci
Originality Highly original
AI Analysis

This work addresses the problem of interpretability in neural networks for researchers in cognitive modeling and AI, offering a novel method for analyzing latent representations.

The authors tackled the challenge of interpreting learned task representations in neural networks by introducing a probabilistic framework based on Bayesian inference, which infers causal contributions of representational units to task performance and provides tools to analyze properties like distributedness and complexity.

Neural networks are powerful tools for cognitive modeling due to their flexibility and emergent properties. However, interpreting their learned representations remains challenging due to their sub-symbolic semantics. In this work, we introduce a novel probabilistic framework for interpreting latent task representations in neural networks. Inspired by Bayesian inference, our approach defines a distribution over representational units to infer their causal contributions to task performance. Using ideas from information theory, we propose a suite of tools and metrics to illuminate key model properties, including representational distributedness, manifold complexity, and polysemanticity.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes