LGNov 11, 2022

Emergence of Concepts in DNNs?

arXiv:2211.06137v1h-index: 2
Originality Synthesis-oriented
AI Analysis

This work addresses the interpretability of DNNs for researchers and practitioners, but it is incremental as it reviews and critiques existing approaches without introducing new methods.

The paper reviews methods for identifying concepts in deep neural networks' internal representations and discusses how conceptual spaces are shaped by a tradeoff between predictive accuracy and compression, concluding that while DNNs can represent non-trivial inferential relations, our ability to identify these concepts is severely limited.

The present paper reviews and discusses work from computer science that proposes to identify concepts in internal representations (hidden layers) of DNNs. It is examined, first, how existing methods actually identify concepts that are supposedly represented in DNNs. Second, it is discussed how conceptual spaces -- sets of concepts in internal representations -- are shaped by a tradeoff between predictive accuracy and compression. These issues are critically examined by drawing on philosophy. While there is evidence that DNNs able to represent non-trivial inferential relations between concepts, our ability to identify concepts is severely limited.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes