Decision boundaries and convex hulls in the feature space that deep learning functions learn from images
This work addresses a fundamental gap in understanding how deep learning models process images, which is crucial for researchers and practitioners in AI and computer vision, though it is incremental in building theoretical foundations.
The paper tackles the problem of understanding the low-dimensional feature space learned by deep neural networks for image classification, developing methods to study decision boundaries and convex hulls in that space, and finds that geometric arrangements differ significantly from pixel space, providing insights into adversarial vulnerabilities and model interpretation.
The success of deep neural networks in image classification and learning can be partly attributed to the features they extract from images. It is often speculated about the properties of a low-dimensional manifold that models extract and learn from images. However, there is not sufficient understanding about this low-dimensional space based on theory or empirical evidence. For image classification models, their last hidden layer is the one where images of each class is separated from other classes and it also has the least number of features. Here, we develop methods and formulations to study that feature space for any model. We study the partitioning of the domain in feature space, identify regions guaranteed to have certain classifications, and investigate its implications for the pixel space. We observe that geometric arrangements of decision boundaries in feature space is significantly different compared to pixel space, providing insights about adversarial vulnerabilities, image morphing, extrapolation, ambiguity in classification, and the mathematical understanding of image classification models.