Pushing the Limits of Capsule Networks
This work addresses the problem of feature representation in neural networks for researchers, but it is incremental as it builds on existing CapsNet frameworks without introducing new methods.
The paper investigates Capsule Networks (CapsNets) by testing them on harder datasets similar to MNIST and analyzing their internal embedding spaces and error sources, aiming to understand their performance and expressiveness better.
Convolutional neural networks use pooling and other downscaling operations to maintain translational invariance for detection of features, but in their architecture they do not explicitly maintain a representation of the locations of the features relative to each other. This means they do not represent two instances of the same object in different orientations the same way, like humans do, and so training them often requires extensive data augmentation and exceedingly deep networks. A team at Google Brain recently made news with an attempt to fix this problem: Capsule Networks. While a normal CNN works with scalar outputs representing feature presence, a CapsNet works with vector outputs representing entity presence. We want to stress test CapsNet in various incremental ways to better understand their performance and expressiveness. In broad terms, the goals of our investigation are: (1) test CapsNets on datasets that are like MNIST but harder in a specific way, and (2) explore the internal embedding space and sources of error for CapsNets.