Open Banking Foundational Model: Learning Language Representations from Few Financial Transactions
This work addresses the challenge of learning from limited financial data for applications like fraud prevention and credit risk, representing an incremental advance in applying self-supervised models to the financial domain.
The paper tackled the problem of representing financial transactions in data-scarce Open Banking scenarios by introducing a multimodal foundational model that integrates structured and unstructured data, demonstrating it outperforms classical methods and generalizes across institutions and geographies.
We introduced a multimodal foundational model for financial transactions that integrates both structured attributes and unstructured textual descriptions into a unified representation. By adapting masked language modeling to transaction sequences, we demonstrated that our approach not only outperforms classical feature engineering and discrete event sequence methods but is also particularly effective in data-scarce Open Banking scenarios. To our knowledge, this is the first large-scale study across thousands of financial institutions in North America, providing evidence that multimodal representations can generalize across geographies and institutions. These results highlight the potential of self-supervised models to advance financial applications ranging from fraud prevention and credit risk to customer insights