What Do Deep CNNs Learn About Objects?
This work addresses the interpretability of deep learning models for researchers and practitioners in computer vision, but it appears incremental as it builds on prior analyses of CNN representations.
The paper investigates the invariance of deep convolutional neural networks to object-class variations caused by 3D shape, pose, and photorealism, aiming to understand what these networks learn about objects.
Deep convolutional neural networks learn extremely powerful image representations, yet most of that power is hidden in the millions of deep-layer parameters. What exactly do these parameters represent? Recent work has started to analyse CNN representations, finding that, e.g., they are invariant to some 2D transformations Fischer et al. (2014), but are confused by particular types of image noise Nguyen et al. (2014). In this work, we delve deeper and ask: how invariant are CNNs to object-class variations caused by 3D shape, pose, and photorealism?