LGAug 18, 2022

A Hybrid Self-Supervised Learning Framework for Vertical Federated Learning

arXiv:2208.08934v234 citationsh-index: 20Has Code
Originality Incremental advance
AI Analysis

This work addresses data efficiency and privacy concerns in VFL for enterprises, though it appears incremental as it builds on existing SSL methods within a federated context.

The paper tackles the problem of data deficiency in vertical federated learning (VFL) by proposing FedHSSL, a hybrid self-supervised learning framework that uses cross-party and local views to improve representation learning, achieving better privacy-utility trade-offs against state-of-the-art attacks compared to baselines.

Vertical federated learning (VFL), a variant of Federated Learning (FL), has recently drawn increasing attention as the VFL matches the enterprises' demands of leveraging more valuable features to achieve better model performance. However, conventional VFL methods may run into data deficiency as they exploit only aligned and labeled samples (belonging to different parties), leaving often the majority of unaligned and unlabeled samples unused. The data deficiency hampers the effort of the federation. In this work, we propose a Federated Hybrid Self-Supervised Learning framework, named FedHSSL, that utilizes cross-party views (i.e., dispersed features) of samples aligned among parties and local views (i.e., augmentation) of unaligned samples within each party to improve the representation learning capability of the VFL joint model. FedHSSL further exploits invariant features across parties to boost the performance of the joint model through partial model aggregation. FedHSSL, as a framework, can work with various representative SSL methods. We empirically demonstrate that FedHSSL methods outperform baselines by large margins. We provide an in-depth analysis of FedHSSL regarding label leakage, which is rarely investigated in existing self-supervised VFL works. The experimental results show that, with proper protection, FedHSSL achieves the best privacy-utility trade-off against the state-of-the-art label inference attack compared with baselines. Code is available at \url{https://github.com/jorghyq2016/FedHSSL}.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes