Feature Representation in Deep Metric Embeddings
This research provides insights into the interpretability of deep metric learning models for facial recognition, which is important for understanding their decision-making process and potential biases.
This study investigates the features used by deep metric learning (DML) models for facial identity discrimination. It found that DML embeddings can distinguish intra-class attributes like beards and glasses with 90.0% and 76.0% accuracy, respectively, and extra-class attributes such as gender, skin tone, and age with accuracies of 99.3%, 99.3%, and 94.1%.
In deep metric learning (DML), high-level input data are represented in a lower-level representation (embedding) space, such that samples from the same class are mapped close together, while samples from disparate classes are mapped further apart. In this lower-level representation, only a single inference sample from each known class is required to discriminate between classes accurately. The features a DML model uses to discriminate between classes and the importance of each feature in the training process are unknown. To investigate this, this study takes embeddings trained to discriminate faces (identities) and uses unsupervised clustering to identify the features involved in facial identity discrimination by examining their representation within the embedded space. This study is split into two cases; intra class sub-discrimination, where attributes that differ between a single identity are considered; such as beards and emotions; and extra class sub-discrimination, where attributes which differ between different identities/people, are considered; such as gender, skin tone and age. In the intra class scenario, the inference process distinguishes common attributes between single identities, achieving 90.0\% and 76.0\% accuracy for beards and glasses, respectively. The system can also perform extra class sub-discrimination with a high accuracy rate, notably 99.3\%, 99.3\% and 94.1\% for gender, skin tone, and age, respectively.