AILGNEJan 19

Actionable Interpretability Must Be Defined in Terms of Symmetries

arXiv:2601.12913v12 citations
Originality Incremental advance
AI Analysis

This addresses the problem of making AI interpretability more practical and principled for researchers and practitioners, though it appears incremental as it builds on existing concepts.

The paper argues that current interpretability definitions in AI are not actionable and proposes that actionable interpretability must be defined in terms of symmetries, hypothesizing that four symmetries can motivate core properties, characterize interpretable models, and derive a unified formulation of interpretable inference.

This paper argues that interpretability research in Artificial Intelligence is fundamentally ill-posed as existing definitions of interpretability are not *actionable*: they fail to provide formal principles from which concrete modelling and inferential rules can be derived. We posit that for a definition of interpretability to be actionable, it must be given in terms of *symmetries*. We hypothesise that four symmetries suffice to (i) motivate core interpretability properties, (ii) characterize the class of interpretable models, and (iii) derive a unified formulation of interpretable inference (e.g., alignment, interventions, and counterfactuals) as a form of Bayesian inversion.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes