LGJan 3, 2024

Towards a Foundation Purchasing Model: Pretrained Generative Autoregression on Transaction Sequences

Piotr Skalski, David Sutton, Stuart Burrell, Iker Perez, Jason Wong

arXiv:2401.01641v211.59 citationsh-index: 7Has CodeICAIF

Originality Incremental advance

AI Analysis

This addresses the problem of limited labeled data for financial applications like fraud detection, offering a novel approach that is incremental in adapting existing generative techniques to a new domain.

The paper tackles the lack of self-supervised generative models for financial transaction time series by introducing a pretraining method that produces contextualized embeddings, which outperform state-of-the-art methods on downstream tasks and improve fraud detection rates with concrete gains in value detection at high precision thresholds.

Machine learning models underpin many modern financial systems for use cases such as fraud detection and churn prediction. Most are based on supervised learning with hand-engineered features, which relies heavily on the availability of labelled data. Large self-supervised generative models have shown tremendous success in natural language processing and computer vision, yet so far they haven't been adapted to multivariate time series of financial transactions. In this paper, we present a generative pretraining method that can be used to obtain contextualised embeddings of financial transactions. Benchmarks on public datasets demonstrate that it outperforms state-of-the-art self-supervised methods on a range of downstream tasks. We additionally perform large-scale pretraining of an embedding model using a corpus of data from 180 issuing banks containing 5.1 billion transactions and apply it to the card fraud detection problem on hold-out datasets. The embedding model significantly improves value detection rate at high precision thresholds and transfers well to out-of-domain distributions.

View on arXiv PDF Code

Similar