CVJun 15, 2021
Canonical Face EmbeddingsDavid McNeely-White, Ben Sattelberg, Nathaniel Blanchard et al.
We present evidence that many common convolutional neural networks (CNNs) trained for face verification learn functions that are nearly equivalent under rotation. More specifically, we demonstrate that one face verification model's embeddings (i.e. last-layer activations) can be compared directly to another model's embeddings after only a rotation or linear transformation, with little performance penalty. This finding is demonstrated using IJB-C 1:1 verification across the combinations of ten modern off-the-shelf CNN-based face verification models which vary in training dataset, CNN architecture, method of angular loss calculation, or some combination of the 3. These networks achieve a mean true accept rate of 0.96 at a false accept rate of 0.01. When instead evaluating embeddings generated from two CNNs, where one CNN's embeddings are mapped with a linear transformation, the mean true accept rate drops to 0.95 using the same verification paradigm. Restricting these linear maps to only perform rotation produces a mean true accept rate of 0.91. These mappings' existence suggests that a common representation is learned by models despite variation in training or structure. We discuss the broad implications a result like this has, including an example regarding face template security.
CVOct 5, 2020
Exploring the Interchangeability of CNN Embedding SpacesDavid McNeely-White, Benjamin Sattelberg, Nathaniel Blanchard et al.
CNN feature spaces can be linearly mapped and consequently are often interchangeable. This equivalence holds across variations in architectures, training datasets, and network tasks. Specifically, we mapped between 10 image-classification CNNs and between 4 facial-recognition CNNs. When image embeddings generated by one CNN are transformed into embeddings corresponding to the feature space of a second CNN trained on the same task, their respective image classification or face verification performance is largely preserved. For CNNs trained to the same classes and sharing a common backend-logit (soft-max) architecture, a linear-mapping may always be calculated directly from the backend layer weights. However, the case of a closed-set analysis with perfect knowledge of classifiers is limiting. Therefore, empirical methods of estimating mappings are presented for both the closed-set image classification task and the open-set task of face recognition. The results presented expose the essentially interchangeable nature of CNNs embeddings for two important and common recognition tasks. The implications are far-reaching, suggesting an underlying commonality between representations learned by networks designed and trained for a common task. One practical implication is that face embeddings from some commonly used CNNs can be compared using these mappings.