LGMLApr 20, 2018

Streaming Active Learning Strategies for Real-Life Credit Card Fraud Detection: Assessment and Visualization

arXiv:1804.07481v198 citations
Originality Synthesis-oriented
AI Analysis

This work addresses the problem of efficient fraud detection for financial institutions, but it is incremental as it applies known active learning methods to a specific domain.

The paper tackles the challenge of selecting which credit card transactions to label for fraud detection in a streaming, imbalanced, and non-stationary environment, finding that active learning strategies reveal an exploitation/exploration trade-off that improves fraud detection accuracy.

Credit card fraud detection is a very challenging problem because of the specific nature of transaction data and the labeling process. The transaction data is peculiar because they are obtained in a streaming fashion, they are strongly imbalanced and prone to non-stationarity. The labeling is the outcome of an active learning process, as every day human investigators contact only a small number of cardholders (associated to the riskiest transactions) and obtain the class (fraud or genuine) of the related transactions. An adequate selection of the set of cardholders is therefore crucial for an efficient fraud detection process. In this paper, we present a number of active learning strategies and we investigate their fraud detection accuracies. We compare different criteria (supervised, semi-supervised and unsupervised) to query unlabeled transactions. Finally, we highlight the existence of an exploitation/exploration trade-off for active learning in the context of fraud detection, which has so far been overlooked in the literature.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes