Kalanit Grill-Spector

2papers

2 Papers

CVJun 16, 2020
Validation and generalization of pixel-wise relevance in convolutional neural networks trained for face classification

Jñani Crawford, Eshed Margalit, Kalanit Grill-Spector et al.

The increased use of convolutional neural networks for face recognition in science, governance, and broader society has created an acute need for methods that can show how these 'black box' decisions are made. To be interpretable and useful to humans, such a method should convey a model's learned classification strategy in a way that is robust to random initializations or spurious correlations in input data. To this end, we applied the decompositional pixel-wise attribution method of layer-wise relevance propagation (LRP) to resolve the decisions of several classes of VGG-16 models trained for face recognition. We then quantified how these relevance measures vary with and generalize across key model parameters, such as the pretraining dataset (ImageNet or VGGFace), the finetuning task (gender or identity classification), and random initializations of model weights. Using relevance-based image masking, we find that relevance maps for face classification prove generally stable across random initializations, and can generalize across finetuning tasks. However, there is markedly less generalization across pretraining datasets, indicating that ImageNet- and VGGFace-trained models sample face information differently even as they achieve comparably high classification performance. Fine-grained analyses of relevance maps across models revealed asymmetries in generalization that point to specific benefits of choice parameters, and suggest that it may be possible to find an underlying set of important face image pixels that drive decisions across convolutional neural networks and tasks. Finally, we evaluated model decision weighting against human measures of similarity, providing a novel framework for interpreting face recognition decisions across human and machine.

CVOct 31, 2018
The Effect of Learning Strategy versus Inherent Architecture Properties on the Ability of Convolutional Neural Networks to Develop Transformation Invariance

Megha Srivastava, Kalanit Grill-Spector

As object recognition becomes an increasingly common ML task, and recent research demonstrating CNNs vulnerability to attacks and small image perturbations necessitate fully understanding the foundations of object recognition. We focus on understanding the mechanisms behind how neural networks generalize to spatial transformations of complex objects. While humans excel at discriminating between objects shown at new positions, orientations, and scales, past results demonstrate that this may be limited to familiar objects - humans demonstrate low tolerance of spatial-variances for purposefully constructed novel objects. Because training artificial neural networks from scratch is similar to showing novel objects to humans, we seek to understand the factors influencing the tolerance of CNNs to spatial transformations. We conduct a thorough empirical examination of seven Convolutional Neural Network (CNN) architectures. By training on a controlled face image dataset, we measure model accuracy across different degrees of 5 transformations: position, size, rotation, Gaussian blur, and resolution transformation due to resampling. We also examine how learning strategy affects generalizability by examining how different amounts of pre-training have on model robustness. Overall, we find that the most significant contributor to transformation invariance is pre-training on a large, diverse image dataset. Moreover, while AlexNet tends to be the least robust network, VGG and ResNet architectures demonstrate higher robustness for different transformations. Along with kernel visualizations and qualitative analyses, we examine differences between learning strategy and inherent architectural properties in contributing to invariance of transformations, providing valuable information towards understanding how to achieve greater robustness to transformations in CNNs.