LGAIFeb 16, 2025

Shortcuts and Identifiability in Concept-based Models from a Neuro-Symbolic Lens

arXiv:2502.11245v213 citationsh-index: 22
Originality Incremental advance
AI Analysis

This addresses the challenge of ensuring interpretable and reliable concept-based models for AI safety and explainability, but it is incremental as it builds on prior work on reasoning shortcuts.

The paper tackles the problem of reasoning shortcuts in concept-based models, where models achieve high accuracy using low-quality concepts, and derives theoretical conditions for identifying both concepts and inference layers, showing that existing methods often fail to meet these conditions in practice.

Concept-based Models are neural networks that learn a concept extractor to map inputs to high-level concepts and an inference layer to translate these into predictions. Ensuring these modules produce interpretable concepts and behave reliably in out-of-distribution is crucial, yet the conditions for achieving this remain unclear. We study this problem by establishing a novel connection between Concept-based Models and reasoning shortcuts (RSs), a common issue where models achieve high accuracy by learning low-quality concepts, even when the inference layer is fixed and provided upfront. Specifically, we extend RSs to the more complex setting of Concept-based Models and derive theoretical conditions for identifying both the concepts and the inference layer. Our empirical results highlight the impact of RSs and show that existing methods, even combined with multiple natural mitigation strategies, often fail to meet these conditions in practice.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes