Michael C. King

h-index24

16papers

298citations

Novelty32%

AI Score39

Ranked #81,058 of 194,257 authors (top 42%)#27,366 in CV (top 46%)

16 Papers

15.3CVJun 4, 2022Code

Face Recognition Accuracy Across Demographics: Shining a Light Into the Problem

Haiyu Wu, Vítor Albiero, K. S. Krishnapriya et al.

We explore varying face recognition accuracy across demographic groups as a phenomenon partly caused by differences in face illumination. We observe that for a common operational scenario with controlled image acquisition, there is a large difference in face region brightness between African-American and Caucasian, and also a smaller difference between male and female. We show that impostor image pairs with both faces under-exposed, or both overexposed, have an increased false match rate (FMR). Conversely, image pairs with strongly different face brightness have a decreased similarity measure. We propose a brightness information metric to measure variation in brightness in the face and show that face brightness that is too low or too high has reduced information in the face region, providing a cause for the lower accuracy. Based on this, for operational scenarios with controlled image acquisition, illumination should be adjusted for each individual to obtain appropriate face image brightness. This is the first work that we are aware of to explore how the level of brightness of the skin region in a pair of face images (rather than a single image) impacts face recognition accuracy, and to evaluate this as a systematic factor causing unequal accuracy across demographics. The code is at https://github.com/HaiyuWu/FaceBrightness.

12.2CVJun 10, 2022

The Gender Gap in Face Recognition Accuracy Is a Hairy Problem

Aman Bhatta, Vítor Albiero, Kevin W. Bowyer et al.

It is broadly accepted that there is a "gender gap" in face recognition accuracy, with females having higher false match and false non-match rates. However, relatively little is known about the cause(s) of this gender gap. Even the recent NIST report on demographic effects lists "analyze cause and effect" under "what we did not do". We first demonstrate that female and male hairstyles have important differences that impact face recognition accuracy. In particular, compared to females, male facial hair contributes to creating a greater average difference in appearance between different male faces. We then demonstrate that when the data used to estimate recognition accuracy is balanced across gender for how hairstyles occlude the face, the initially observed gender gap in accuracy largely disappears. We show this result for two different matchers, and analyzing images of Caucasians and of African-Americans. These results suggest that future research on demographic variation in accuracy should include a check for balanced quality of the test data as part of the problem formulation. To promote reproducible research, matchers, attribute classifiers, and datasets used in this research are/will be publicly available.

6.8CVSep 8, 2023

Impact of Blur and Resolution on Demographic Disparities in 1-to-Many Facial Identification

Aman Bhatta, Gabriella Pangelinan, Michael C. King et al.

Most studies to date that have examined demographic variations in face recognition accuracy have analyzed 1-to-1 matching accuracy, using images that could be described as "government ID quality". This paper analyzes the accuracy of 1-to-many facial identification across demographic groups, and in the presence of blur and reduced resolution in the probe image as might occur in "surveillance camera quality" images. Cumulative match characteristic curves (CMC) are not appropriate for comparing propensity for rank-one recognition errors across demographics, and so we use three metrics for our analysis: (1) the well-known d' metric between mated and non-mated score distributions, and introduced in this work, (2) absolute score difference between thresholds in the high-similarity tail of the non-mated and the low-similarity tail of the mated distribution, and (3) distribution of (mated - non-mated rank-one scores) across the set of probe images. We find that demographic variation in 1-to-many accuracy does not entirely follow what has been observed in 1-to-1 matching accuracy. Also, different from 1-to-1 accuracy, demographic comparison of 1-to-many accuracy can be affected by different numbers of identities and images across demographics. More importantly, we show that increased blur in the probe image, or reduced resolution of the face in the probe image, can significantly increase the false positive identification rate. And we show that the demographic variation in these high blur or low resolution conditions is much larger for male / female than for African-American / Caucasian. The point that 1-to-many accuracy can potentially collapse in the context of processing "surveillance camera quality" probe images against a "government ID quality" gallery is an important one.

3.9CVApr 14, 2023

Exploring Causes of Demographic Variations In Face Recognition Accuracy

Gabriella Pangelinan, K. S. Krishnapriya, Vitor Albiero et al.

In recent years, media reports have called out bias and racism in face recognition technology. We review experimental results exploring several speculated causes for asymmetric cross-demographic performance. We consider accuracy differences as represented by variations in non-mated (impostor) and / or mated (genuine) distributions for 1-to-1 face matching. Possible causes explored include differences in skin tone, face size and shape, imbalance in number of identities and images in the training data, and amount of face visible in the test data ("face pixels"). We find that demographic differences in face pixel information of the test images appear to most directly impact the resultant differences in face recognition accuracy.

10.1CVOct 13, 2022Code

Consistency and Accuracy of CelebA Attribute Values

Haiyu Wu, Grace Bezold, Manuel Günther et al.

We report the first systematic analysis of the experimental foundations of facial attribute classification. Two annotators independently assigning attribute values shows that only 12 of 40 common attributes are assigned values with >= 95% consistency, and three (high cheekbones, pointed nose, oval face) have essentially random consistency. Of 5,068 duplicate face appearances in CelebA, attributes have contradicting values on from 10 to 860 of the 5,068 duplicates. Manual audit of a subset of CelebA estimates error rates as high as 40% for (no beard=false), even though the labeling consistency experiment indicates that no beard could be assigned with >= 95% consistency. Selecting the mouth slightly open (MSO) for deeper analysis, we estimate the error rate for (MSO=true) at about 20% and (MSO=false) at about 2%. A corrected version of the MSO attribute values enables learning a model that achieves higher accuracy than previously reported for MSO. Corrected values for CelebA MSO are available at https://github.com/HaiyuWu/CelebAMSO.

4.9OCApr 5

Gramians for a New Class of Nonlinear Control Systems Using Koopman and a Novel Generalized SVD

Brian Brown, Michael King

Certified model reduction for high-dimensional nonlinear control systems remains challenging: unlike balanced truncation for LTI systems, most nonlinear reduction methods either lack computable worst-case error bounds or rely on intractable PDEs. Data-driven Koopman/DMDc surrogates improve tractability, but standard \emph{input lifting} can distort the physical input-energy metric, so $H_\infty$ and Hankel-based bounds computed on the lifted model may be valid only in a lifted-input norm and need not certify the original system. We address this metric mismatch by a Generalized Singular Value Decomposition (GSVD)-based construction that represents general (including non-affine) input nonlinearities in an LTI-like lifted form with a \emph{pointwise norm-preserving} input map $v(x,u)$ satisfying $\|v(x,u)\|_2=\|u\|_2$ and constant matrices $A,B$. This preserves strict causality (constant $B$, no input-history augmentation) and yields computable Hankel-singular-value-based $H_\infty$ error certificates in the physical input norm for reduced-order surrogates. We illustrate the method on a 25-dimensional Hodgkin--Huxley network with saturating optogenetic actuation, reducing to a single dominant mode while retaining certified error bounds.

5.2CVMay 24, 2024Code

Goldilocks Test Sets for Face Verification

Haiyu Wu, Sicong Tian, Aman Bhatta et al.

Reported face verification accuracy has reached a plateau on current well-known test sets. As a result, some difficult test sets have been assembled by reducing the image quality or adding artifacts to the image. However, we argue that test sets can be challenging without artificially reducing the image quality because the face recognition (FR) models suffer from correctly recognizing 1) the pairs from the same identity (i.e., genuine pairs) with a large face attribute difference, 2) the pairs from different identities (i.e., impostor pairs) with a small face attribute difference, and 3) the pairs of similar-looking identities (e.g., twins and relatives). We propose three challenging test sets to reveal important but ignored weaknesses of the existing FR algorithms. To challenge models on variation of facial attributes, we propose Hadrian and Eclipse to address facial hair differences and face exposure differences. The images in both test sets are high-quality and collected in a controlled environment. To challenge FR models on similar-looking persons, we propose twins-IND, which contains images from a dedicated twins dataset. The LFW test protocol is used to structure the proposed test sets. Moreover, we introduce additional rules to assemble "Goldilocks1" level test sets, including 1) restricted number of occurrence of hard samples, 2) equal chance evaluation across demographic groups, and 3) constrained identity overlap across validation folds. Quantitatively, without further processing the images, the proposed test sets have on-par or higher difficulties than the existing test sets. The datasets are available at: https: //github.com/HaiyuWu/SOTA-Face-Recognition-Train-and-Test.

5.6CVApr 28, 2021Code

Does Face Recognition Error Echo Gender Classification Error?

Ying Qiu, Vítor Albiero, Michael C. King et al.

This paper is the first to explore the question of whether images that are classified incorrectly by a face analytics algorithm (e.g., gender classification) are any more or less likely to participate in an image pair that results in a face recognition error. We analyze results from three different gender classification algorithms (one open-source and two commercial), and two face recognition algorithms (one open-source and one commercial), on image sets representing four demographic groups (African-American female and male, Caucasian female and male). For impostor image pairs, our results show that pairs in which one image has a gender classification error have a better impostor distribution than pairs in which both images have correct gender classification, and so are less likely to generate a false match error. For genuine image pairs, our results show that individuals whose images have a mix of correct and incorrect gender classification have a worse genuine distribution (increased false non-match rate) compared to individuals whose images all have correct gender classification. Thus, compared to images that generate correct gender classification, images that generate gender classification errors do generate a different pattern of recognition errors, both better (false match) and worse (false non-match).

11.8CVJan 15, 2025

Lights, Camera, Matching: The Role of Image Illumination in Fair Face Recognition

Gabriella Pangelinan, Grace Bezold, Haiyu Wu et al.

Facial brightness is a key image quality factor impacting face recognition accuracy differentials across demographic groups. In this work, we aim to decrease the accuracy gap between the similarity score distributions for Caucasian and African American female mated image pairs, as measured by d' between distributions. To balance brightness across demographic groups, we conduct three experiments, interpreting brightness in the face skin region either as median pixel value or as the distribution of pixel values. Balancing based on median brightness alone yields up to a 46.8% decrease in d', while balancing based on brightness distribution yields up to a 57.6% decrease. In all three cases, the similarity scores of the individual distributions improve, with mean scores maximally improving 5.9% for Caucasian females and 3.7% for African American females.

3.7CVDec 7, 2024

Impact of Sunglasses on One-to-Many Facial Identification Accuracy

Sicong Tian, Haiyu Wu, Michael C. King et al.

One-to-many facial identification is documented to achieve high accuracy in the case where both the probe and the gallery are "mugshot quality" images. However, an increasing number of documented instances of wrongful arrest following one-to-many facial identification have raised questions about its accuracy. Probe images used in one-to-many facial identification are often cropped from frames of surveillance video and deviate from "mugshot quality" in various ways. This paper systematically explores how the accuracy of one-to-many facial identification is degraded by the person in the probe image choosing to wear dark sunglasses. We show that sunglasses degrade accuracy for mugshot-quality images by an amount similar to strong blur or noticeably lower resolution. Further, we demonstrate that the combination of sunglasses with blur or lower resolution results in even more pronounced loss in accuracy. These results have important implications for developing objective criteria to qualify a probe image for the level of accuracy to be expected if it used for one-to-many identification. To ameliorate the accuracy degradation caused by dark sunglasses, we show that it is possible to recover about 38% of the lost accuracy by synthetically adding sunglasses to all the gallery images, without model re-training. We also show that the frequency of wearing-sunglasses images is very low in existing training sets, and that increasing the representation of wearing-sunglasses images can greatly reduce the error rate. The image set assembled for this research is available at https://cvrl.nd.edu/projects/data/ to support replication and further research.

3.6CVMar 25, 2025

Peepers & Pixels: Human Recognition Accuracy on Low Resolution Faces

Xavier Merino, Gabriella Pangelinan, Samuel Langborgh et al.

Automated one-to-many (1:N) face recognition is a powerful investigative tool commonly used by law enforcement agencies. In this context, potential matches resulting from automated 1:N recognition are reviewed by human examiners prior to possible use as investigative leads. While automated 1:N recognition can achieve near-perfect accuracy under ideal imaging conditions, operational scenarios may necessitate the use of surveillance imagery, which is often degraded in various quality dimensions. One important quality dimension is image resolution, typically quantified by the number of pixels on the face. The common metric for this is inter-pupillary distance (IPD), which measures the number of pixels between the pupils. Low IPD is known to degrade the accuracy of automated face recognition. However, the threshold IPD for reliability in human face recognition remains undefined. This study aims to explore the boundaries of human recognition accuracy by systematically testing accuracy across a range of IPD values. We find that at low IPDs (10px, 5px), human accuracy is at or below chance levels (50.7%, 35.9%), even as confidence in decision-making remains relatively high (77%, 70.7%). Our findings indicate that, for low IPD images, human recognition ability could be a limiting factor to overall system accuracy.

1.5CVMay 10, 2023

Analysis of Adversarial Image Manipulations

Ahsi Lo, Gabriella Pangelinan, Michael C. King

As virtual and physical identity grow increasingly intertwined, the importance of privacy and security in the online sphere becomes paramount. In recent years, multiple news stories have emerged of private companies scraping web content and doing research with or selling the data. Images uploaded online can be scraped without users' consent or knowledge. Users of social media platforms whose images are scraped may be at risk of being identified in other uploaded images or in real-world identification situations. This paper investigates how simple, accessible image manipulation techniques affect the accuracy of facial recognition software in identifying an individual's various face images based on one unique image.

14.4CVDec 29, 2021

Gendered Differences in Face Recognition Accuracy Explained by Hairstyles, Makeup, and Facial Morphology

Vítor Albiero, Kai Zhang, Michael C. King et al.

Media reports have accused face recognition of being ''biased'', ''sexist'' and ''racist''. There is consensus in the research literature that face recognition accuracy is lower for females, who often have both a higher false match rate and a higher false non-match rate. However, there is little published research aimed at identifying the cause of lower accuracy for females. For instance, the 2019 Face Recognition Vendor Test that documents lower female accuracy across a broad range of algorithms and datasets also lists ''Analyze cause and effect'' under the heading ''What we did not do''. We present the first experimental analysis to identify major causes of lower face recognition accuracy for females on datasets where previous research has observed this result. Controlling for equal amount of visible face in the test images mitigates the apparent higher false non-match rate for females. Additional analysis shows that makeup-balanced datasets further improves females to achieve lower false non-match rates. Finally, a clustering experiment suggests that images of two different females are inherently more similar than of two different males, potentially accounting for a difference in false match rates.

10.0CVApr 29, 2021

Analysis of Manual and Automated Skin Tone Assignments for Face Recognition Applications

KS Krishnapriya, Michael C. King, Kevin W. Bowyer

News reports have suggested that darker skin tone causes an increase in face recognition errors. The Fitzpatrick scale is widely used in dermatology to classify sensitivity to sun exposure and skin tone. In this paper, we analyze a set of manual Fitzpatrick skin type assignments and also employ the individual typology angle to automatically estimate the skin tone from face images. The set of manual skin tone rating experiments shows that there are inconsistencies between human raters that are difficult to eliminate. Efforts to automate skin tone rating suggest that it is particularly challenging on images collected without a calibration object in the scene. However, after the color-correction, the level of agreement between automated and manual approaches is found to be 96% or better for the MORPH images. To our knowledge, this is the first work to: (a) examine the consistency of manual skin tone ratings across observers, (b) document that there is substantial variation in the rating of the same image by different observers even when exemplar images are given for guidance and all images are color-corrected, and (c) compare manual versus automated skin tone ratings.

6.5CVJun 6, 2020

The Criminality From Face Illusion

Kevin W. Bowyer, Michael King, Walter Scheirer et al.

The automatic analysis of face images can generate predictions about a person's gender, age, race, facial expression, body mass index, and various other indices and conditions. A few recent publications have claimed success in analyzing an image of a person's face in order to predict the person's status as Criminal / Non-Criminal. Predicting criminality from face may initially seem similar to other facial analytics, but we argue that attempts to create a criminality-from-face algorithm are necessarily doomed to fail, that apparently promising experimental results in recent publications are an illusion resulting from inadequate experimental design, and that there is potentially a large social cost to belief in the criminality from face illusion.

10.6CVNov 14, 2019

Does Face Recognition Accuracy Get Better With Age? Deep Face Matchers Say No

Vítor Albiero, Kevin W. Bowyer, Kushal Vangara et al.

Previous studies generally agree that face recognition accuracy is higher for older persons than for younger persons. But most previous studies were before the wave of deep learning matchers, and most considered accuracy only in terms of the verification rate for genuine pairs. This paper investigates accuracy for age groups 16-29, 30-49 and 50-70, using three modern deep CNN matchers, and considers differences in the impostor and genuine distributions as well as verification rates and ROC curves. We find that accuracy is lower for older persons and higher for younger persons. In contrast, a pre deep learning matcher on the same dataset shows the traditional result of higher accuracy for older persons, although its overall accuracy is much lower than that of the deep learning matchers. Comparing the impostor and genuine distributions, we conclude that impostor scores have a larger effect than genuine scores in causing lower accuracy for the older age group. We also investigate the effects of training data across the age groups. Our results show that fine-tuning the deep CNN models on additional images of older persons actually lowers accuracy for the older age group. Also, we fine-tune and train from scratch two models using age-balanced training datasets, and these results also show lower accuracy for older age group. These results argue that the lower accuracy for the older age group is not due to imbalance in the original training data.