Semantic Sections: An Atlas-Native Feature Ontology for Obstructed Representation Spaces

arXiv:2603.2086730.2h-index: 9

AI Analysis

For interpretability researchers, this work addresses a fundamental limitation of current feature ontologies in neural networks, offering a more accurate framework for understanding representations in models with obstructed spaces.

The paper introduces 'semantic sections' as a new feature ontology for interpretability in obstructed representation spaces, where global feature directions fail. It formalizes the concept, proves key properties, and demonstrates through experiments on Llama 3.2, Qwen 2.5, and Gemma 2 models that semantic sections capture locally coherent meanings that are not recoverable by raw global-vector similarity, achieving perfect identity recovery on certified supports.

Recent interpretability work often treats a feature as a single global direction, dictionary atom, or latent coordinate shared across contexts. We argue that this ontology can fail in obstructed representation spaces, where locally coherent meanings need not assemble into one globally consistent feature. We introduce an atlas-native replacement object, the semantic section: a transport-compatible family of local feature representatives defined over a context atlas. We formalize semantic sections, prove that tree-supported propagation is always pathwise realizable, and show that cycle consistency is the key criterion for genuine globalization. This yields a distinction between tree-local, globalizable, and twisted sections, with twisted sections capturing locally coherent but holonomy-obstructed meanings. We then develop a discovery-and-certification pipeline based on seeded propagation, synchronization across overlaps, defect-based pruning, cycle-aware taxonomy, and deduplication. Across layer-16 atlases for Llama 3.2 3B Instruct, Qwen 2.5 3B Instruct, and Gemma 2 2B IT, we find nontrivial populations of semantic sections, including cycle-supported globalizable and twisted regimes after deduplication. Most importantly, semantic identity is not recovered by raw global-vector similarity. Even certified globalizable sections show low cross-chart signed cosine similarity, and raw similarity baselines recover only a small fraction of true within-section pairs, often collapsing at moderate thresholds. By contrast, section-based identity recovery is perfect on certified supports. These results support semantic sections as a better feature ontology in obstructed regimes.

View on arXiv PDF

Similar