LGJan 1, 2024

Improve Fidelity and Utility of Synthetic Credit Card Transaction Time Series from Data-centric Perspective

Din-Yin Hsieh, Chi-Hua Wang, Guang Cheng

arXiv:2401.00965v112.56 citationsh-index: 9

Originality Synthesis-oriented

AI Analysis

This work provides practical guidelines for synthetic data practitioners in finance transitioning from real to synthetic datasets, though it is incremental in nature.

The paper tackled the challenge of generating synthetic credit card transaction time series with high fidelity and utility for machine learning tasks by introducing five pre-processing schemas to improve the Conditional Probabilistic Auto-Regressive Model (CPAR), achieving incremental improvements in fidelity and utility, and then used the synthetic data to train fraud detection models.

Exploring generative model training for synthetic tabular data, specifically in sequential contexts such as credit card transaction data, presents significant challenges. This paper addresses these challenges, focusing on attaining both high fidelity to actual data and optimal utility for machine learning tasks. We introduce five pre-processing schemas to enhance the training of the Conditional Probabilistic Auto-Regressive Model (CPAR), demonstrating incremental improvements in the synthetic data's fidelity and utility. Upon achieving satisfactory fidelity levels, our attention shifts to training fraud detection models tailored for time-series data, evaluating the utility of the synthetic data. Our findings offer valuable insights and practical guidelines for synthetic data practitioners in the finance sector, transitioning from real to synthetic datasets for training purposes, and illuminating broader methodologies for synthesizing credit card transaction time series.

View on arXiv PDF

Similar