AI LG NEJan 19

Actionable Interpretability Must Be Defined in Terms of Symmetries

Pietro Barbiero, Mateo Espinosa Zarlenga, Francesco Giannini, Alberto Termine, Filippo Bonchi, Mateja Jamnik, Giuseppe Marra

arXiv:2601.12913v16.02 citations

Originality Incremental advance

AI Analysis

This addresses the problem of making AI interpretability more practical and principled for researchers and practitioners, though it appears incremental as it builds on existing concepts.

The paper argues that current interpretability definitions in AI are not actionable and proposes that actionable interpretability must be defined in terms of symmetries, hypothesizing that four symmetries can motivate core properties, characterize interpretable models, and derive a unified formulation of interpretable inference.

This paper argues that interpretability research in Artificial Intelligence is fundamentally ill-posed as existing definitions of interpretability are not *actionable*: they fail to provide formal principles from which concrete modelling and inferential rules can be derived. We posit that for a definition of interpretability to be actionable, it must be given in terms of *symmetries*. We hypothesise that four symmetries suffice to (i) motivate core interpretability properties, (ii) characterize the class of interpretable models, and (iii) derive a unified formulation of interpretable inference (e.g., alignment, interventions, and counterfactuals) as a form of Bayesian inversion.

View on arXiv PDF

Similar