CRLGOct 30, 2023

Privacy-Preserving Federated Learning over Vertically and Horizontally Partitioned Data for Financial Anomaly Detection

arXiv:2310.19304v15 citationsh-index: 53
Originality Incremental advance
AI Analysis

This addresses privacy-preserving collaboration for financial institutions facing regulatory and competitive barriers, though it is incremental in combining existing techniques for a specific scenario.

The paper tackles the problem of financial anomaly detection with data partitioned both vertically and horizontally across entities like banks and payment networks, where trust is limited, by proposing PV4FAD, a solution that combines homomorphic encryption, secure multi-party computation, differential privacy, and randomization to balance privacy and accuracy, resulting in high-utility models with reduced noise and an ensemble approach for increased accuracy, as evidenced by winning second prize in the U.S. PETs Prize Challenge.

The effective detection of evidence of financial anomalies requires collaboration among multiple entities who own a diverse set of data, such as a payment network system (PNS) and its partner banks. Trust among these financial institutions is limited by regulation and competition. Federated learning (FL) enables entities to collaboratively train a model when data is either vertically or horizontally partitioned across the entities. However, in real-world financial anomaly detection scenarios, the data is partitioned both vertically and horizontally and hence it is not possible to use existing FL approaches in a plug-and-play manner. Our novel solution, PV4FAD, combines fully homomorphic encryption (HE), secure multi-party computation (SMPC), differential privacy (DP), and randomization techniques to balance privacy and accuracy during training and to prevent inference threats at model deployment time. Our solution provides input privacy through HE and SMPC, and output privacy against inference time attacks through DP. Specifically, we show that, in the honest-but-curious threat model, banks do not learn any sensitive features about PNS transactions, and the PNS does not learn any information about the banks' dataset but only learns prediction labels. We also develop and analyze a DP mechanism to protect output privacy during inference. Our solution generates high-utility models by significantly reducing the per-bank noise level while satisfying distributed DP. To ensure high accuracy, our approach produces an ensemble model, in particular, a random forest. This enables us to take advantage of the well-known properties of ensembles to reduce variance and increase accuracy. Our solution won second prize in the first phase of the U.S. Privacy Enhancing Technologies (PETs) Prize Challenge.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes