Jacob Gutierrez

CV
h-index27
3papers
5citations
Novelty27%
AI Score29

3 Papers

CVMay 24, 2024Code
Goldilocks Test Sets for Face Verification

Haiyu Wu, Sicong Tian, Aman Bhatta et al.

Reported face verification accuracy has reached a plateau on current well-known test sets. As a result, some difficult test sets have been assembled by reducing the image quality or adding artifacts to the image. However, we argue that test sets can be challenging without artificially reducing the image quality because the face recognition (FR) models suffer from correctly recognizing 1) the pairs from the same identity (i.e., genuine pairs) with a large face attribute difference, 2) the pairs from different identities (i.e., impostor pairs) with a small face attribute difference, and 3) the pairs of similar-looking identities (e.g., twins and relatives). We propose three challenging test sets to reveal important but ignored weaknesses of the existing FR algorithms. To challenge models on variation of facial attributes, we propose Hadrian and Eclipse to address facial hair differences and face exposure differences. The images in both test sets are high-quality and collected in a controlled environment. To challenge FR models on similar-looking persons, we propose twins-IND, which contains images from a dedicated twins dataset. The LFW test protocol is used to structure the proposed test sets. Moreover, we introduce additional rules to assemble "Goldilocks1" level test sets, including 1) restricted number of occurrence of hard samples, 2) equal chance evaluation across demographic groups, and 3) constrained identity overlap across validation folds. Quantitatively, without further processing the images, the proposed test sets have on-par or higher difficulties than the existing test sets. The datasets are available at: https: //github.com/HaiyuWu/SOTA-Face-Recognition-Train-and-Test.

CVMay 15, 2024
Identity Overlap Between Face Recognition Train/Test Data: Causing Optimistic Bias in Accuracy Measurement

Haiyu Wu, Sicong Tian, Jacob Gutierrez et al.

A fundamental tenet of pattern recognition is that overlap between training and testing sets causes an optimistic accuracy estimate. Deep CNNs for face recognition are trained for N-way classification of the identities in the training set. Accuracy is commonly estimated as average 10-fold classification accuracy on image pairs from test sets such as LFW, CALFW, CPLFW, CFP-FP and AgeDB-30. Because train and test sets have been independently assembled, images and identities in any given test set may also be present in any given training set. In particular, our experiments reveal a surprising degree of identity and image overlap between the LFW family of test sets and the MS1MV2 training set. Our experiments also reveal identity label noise in MS1MV2. We compare accuracy achieved with same-size MS1MV2 subsets that are identity-disjoint and not identity-disjoint with LFW, to reveal the size of the optimistic bias. Using more challenging test sets from the LFW family, we find that the size of the optimistic bias is larger for more challenging test sets. Our results highlight the lack of and the need for identity-disjoint train and test methodology in face recognition research.

CVJul 23, 2025
A Comprehensive Evaluation Framework for the Study of the Effects of Facial Filters on Face Recognition Accuracy

Kagan Ozturk, Louisa Conwill, Jacob Gutierrez et al.

Facial filters are now commonplace for social media users around the world. Previous work has demonstrated that facial filters can negatively impact automated face recognition performance. However, these studies focus on small numbers of hand-picked filters in particular styles. In order to more effectively incorporate the wide ranges of filters present on various social media applications, we introduce a framework that allows for larger-scale study of the impact of facial filters on automated recognition. This framework includes a controlled dataset of face images, a principled filter selection process that selects a representative range of filters for experimentation, and a set of experiments to evaluate the filters' impact on recognition. We demonstrate our framework with a case study of filters from the American applications Instagram and Snapchat and the Chinese applications Meitu and Pitu to uncover cross-cultural differences. Finally, we show how the filtering effect in a face embedding space can easily be detected and restored to improve face recognition performance.