Confidence Intervals for Error Rates in 1:1 Matching Tasks: Critical Statistical Analysis and Recommendations
This work addresses a statistical problem for researchers and practitioners in biometrics and matching tasks, but it is incremental as it reviews and refines existing methods rather than introducing a new paradigm.
The paper tackles the challenge of accurately assessing uncertainty in error rates for 1:1 matching algorithms, such as face verification, by reviewing and analyzing methods for constructing confidence intervals, demonstrating how coverage and interval width vary with factors like sample size and data dependence.
Matching algorithms are commonly used to predict matches between items in a collection. For example, in 1:1 face verification, a matching algorithm predicts whether two face images depict the same person. Accurately assessing the uncertainty of the error rates of such algorithms can be challenging when data are dependent and error rates are low, two aspects that have been often overlooked in the literature. In this work, we review methods for constructing confidence intervals for error rates in 1:1 matching tasks. We derive and examine the statistical properties of these methods, demonstrating how coverage and interval width vary with sample size, error rates, and degree of data dependence on both analysis and experiments with synthetic and real-world datasets. Based on our findings, we provide recommendations for best practices for constructing confidence intervals for error rates in 1:1 matching tasks.