Artificial Neural Nets and the Representation of Human Concepts
This addresses a foundational question in machine learning about how ANNs represent knowledge, with implications for interpretability and AI theory.
The paper investigates whether artificial neural networks (ANNs) learn and represent human concepts in individual units, concluding that while ANNs can perform complex tasks and may learn such concepts, they do not store them in individual units.
What do artificial neural networks (ANNs) learn? The machine learning (ML) community shares the narrative that ANNs must develop abstract human concepts to perform complex tasks. Some go even further and believe that these concepts are stored in individual units of the network. Based on current research, I systematically investigate the assumptions underlying this narrative. I conclude that ANNs are indeed capable of performing complex prediction tasks, and that they may learn human and non-human concepts to do so. However, evidence indicates that ANNs do not represent these concepts in individual units.