CRNov 19, 2023Code
An algorithm for forensic toolmark comparisonsMaria Cuellar, Sheng Gao, Heike Hofmann
Forensic toolmark analysis traditionally relies on subjective human judgment, leading to inconsistencies and lack of transparency. The multitude of variables, including angles and directions of mark generation, further complicates comparisons. To address this, we first generate a dataset of 3D toolmarks from various angles and directions using consecutively manufactured slotted screwdrivers. By using PAM clustering, we find that there is clustering by tool rather than angle or direction. Using Known Match and Known Non-Match densities, we establish thresholds for classification. Fitting Beta distributions to the densities, we allow for the derivation of likelihood ratios for new toolmark pairs. With a cross-validated sensitivity of 98% and specificity of 96%, our approach enhances the reliability of toolmark analysis. This approach is applicable to slotted screwdrivers, and for screwdrivers that are made with a similar production method. With data collection of other tools and factors, it could be applied to compare toolmarks of other types. This empirically trained, open-source solution offers forensic examiners a standardized means to objectively compare toolmarks, potentially decreasing the number of miscarriages of justice in the legal system.
CVMay 20, 2025
Accuracy and Fairness of Facial Recognition Technology in Low-Quality Police Images: An Experiment With Synthetic FacesMaria Cuellar, Hon Kiu, To et al.
Facial recognition technology (FRT) is increasingly used in criminal investigations, yet most evaluations of its accuracy rely on high-quality images, unlike those often encountered by law enforcement. This study examines how five common forms of image degradation--contrast, brightness, motion blur, pose shift, and resolution--affect FRT accuracy and fairness across demographic groups. Using synthetic faces generated by StyleGAN3 and labeled with FairFace, we simulate degraded images and evaluate performance using Deepface with ArcFace loss in 1:n identification tasks. We perform an experiment and find that false positive rates peak near baseline image quality, while false negatives increase as degradation intensifies--especially with blur and low resolution. Error rates are consistently higher for women and Black individuals, with Black females most affected. These disparities raise concerns about fairness and reliability when FRT is used in real-world investigative contexts. Nevertheless, even under the most challenging conditions and for the most affected subgroups, FRT accuracy remains substantially higher than that of many traditional forensic methods. This suggests that, if appropriately validated and regulated, FRT should be considered a valuable investigative tool. However, algorithmic accuracy alone is not sufficient: we must also evaluate how FRT is used in practice, including user-driven data manipulation. Such cases underscore the need for transparency and oversight in FRT deployment to ensure both fairness and forensic validity.