LGAIMLJun 17, 2019

Dataset shift quantification for credit card fraud detection

arXiv:1906.06977v125 citations
Originality Synthesis-oriented
AI Analysis

This work addresses dataset shift for credit card fraud detection, but it is incremental as it builds on existing methods with a modest gain.

The authors tackled the problem of dataset shift in credit card fraud detection by quantifying day-by-day shifts in transaction data, finding that shift patterns align with calendar events like holidays and weekends. Incorporating this shift knowledge as a feature led to a small improvement in fraud detection.

Machine learning and data mining techniques have been used extensively in order to detect credit card frauds. However purchase behaviour and fraudster strategies may change over time. This phenomenon is named dataset shift or concept drift in the domain of fraud detection. In this paper, we present a method to quantify day-by-day the dataset shift in our face-to-face credit card transactions dataset (card holder located in the shop) . In practice, we classify the days against each other and measure the efficiency of the classification. The more efficient the classification, the more different the buying behaviour between two days, and vice versa. Therefore, we obtain a distance matrix characterizing the dataset shift. After an agglomerative clustering of the distance matrix, we observe that the dataset shift pattern matches the calendar events for this time period (holidays, week-ends, etc). We then incorporate this dataset shift knowledge in the credit card fraud detection task as a new feature. This leads to a small improvement of the detection.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes