LGMLAug 4, 2023

FPR Estimation for Fraud Detection in the Presence of Class-Conditional Label Noise

arXiv:2308.02695v1h-index: 4
Originality Synthesis-oriented
AI Analysis

This addresses a critical issue for fraud prevention systems where accurate FPR estimates are needed to protect customer experience, but it is incremental as it builds on existing noise-cleaning approaches.

The paper tackles the problem of estimating false-positive rates (FPR) in fraud detection when validation data has asymmetric label noise, showing that existing cleaning methods lead to underestimates even with low total error.

We consider the problem of estimating the false-/ true-positive-rate (FPR/TPR) for a binary classification model when there are incorrect labels (label noise) in the validation set. Our motivating application is fraud prevention where accurate estimates of FPR are critical to preserving the experience for good customers, and where label noise is highly asymmetric. Existing methods seek to minimize the total error in the cleaning process - to avoid cleaning examples that are not noise, and to ensure cleaning of examples that are. This is an important measure of accuracy but insufficient to guarantee good estimates of the true FPR or TPR for a model, and we show that using the model to directly clean its own validation data leads to underestimates even if total error is low. This indicates a need for researchers to pursue methods that not only reduce total error but also seek to de-correlate cleaning error with model scores.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes