Digging Deep into the layers of CNNs: In Search of How CNNs Achieve View Invariance
This work addresses a fundamental problem in computer vision for researchers and practitioners by providing insights into CNN behavior, though it is incremental as it builds on existing understanding of view invariance.
The paper investigates how Convolutional Neural Networks (CNNs) achieve view invariance by analyzing the view-manifold structure across network layers, finding that invariance is achieved by separating rather than collapsing manifolds, with specific layers identified as critical.
This paper is focused on studying the view-manifold structure in the feature spaces implied by the different layers of Convolutional Neural Networks (CNN). There are several questions that this paper aims to answer: Does the learned CNN representation achieve viewpoint invariance? How does it achieve viewpoint invariance? Is it achieved by collapsing the view manifolds, or separating them while preserving them? At which layer is view invariance achieved? How can the structure of the view manifold at each layer of a deep convolutional neural network be quantified experimentally? How does fine-tuning of a pre-trained CNN on a multi-view dataset affect the representation at each layer of the network? In order to answer these questions we propose a methodology to quantify the deformation and degeneracy of view manifolds in CNN layers. We apply this methodology and report interesting results in this paper that answer the aforementioned questions.