The Missing Curve Detectors of InceptionV1: Applying Sparse Autoencoders to InceptionV1 Early Vision
This work helps researchers understand convolutional neural networks like InceptionV1 by improving feature interpretability, though it is incremental as it builds on existing sparse autoencoder methods.
The researchers applied sparse autoencoders to the early vision layers of InceptionV1 to extract interpretable features, uncovering new curve detectors that fill gaps and decomposing polysemantic neurons into more monosemantic features.
Recent work on sparse autoencoders (SAEs) has shown promise in extracting interpretable features from neural networks and addressing challenges with polysemantic neurons caused by superposition. In this paper, we apply SAEs to the early vision layers of InceptionV1, a well-studied convolutional neural network, with a focus on curve detectors. Our results demonstrate that SAEs can uncover new interpretable features not apparent from examining individual neurons, including additional curve detectors that fill in previous gaps. We also find that SAEs can decompose some polysemantic neurons into more monosemantic constituent features. These findings suggest SAEs are a valuable tool for understanding InceptionV1, and convolutional neural networks more generally.