Restricted Receptive Fields for Face Verification
This provides an inherently interpretable method for face verification, addressing the need for reliable explanations in computer vision, though it is incremental in its approach.
The paper tackles the problem of interpretability in deep neural networks for face verification by proposing a face similarity metric that decomposes global similarity into contributions from restricted receptive fields, achieving competitive performance with 28x28 patches and surpassing state-of-the-art methods with 56x56 patches.
Understanding how deep neural networks make decisions is crucial for analyzing their behavior and diagnosing failure cases. In computer vision, a common approach to improve interpretability is to assign importance to individual pixels using post-hoc methods. Although they are widely used to explain black-box models, their fidelity to the model's actual reasoning is uncertain due to the lack of reliable evaluation metrics. This limitation motivates an alternative approach, which is to design models whose decision processes are inherently interpretable. To this end, we propose a face similarity metric that breaks down global similarity into contributions from restricted receptive fields. Our method defines the similarity between two face images as the sum of patch-level similarity scores, providing a locally additive explanation without relying on post-hoc analysis. We show that the proposed approach achieves competitive verification performance even with patches as small as 28x28 within 112x112 face images, and surpasses state-of-the-art methods when using 56x56 patches.