LGCRMLNov 22, 2022

Fed-TDA: Federated Tabular Data Augmentation on Non-IID Data

Microsoft
arXiv:2211.13116v25 citationsh-index: 18
Originality Incremental advance
AI Analysis

This addresses the problem of low performance, privacy concerns, and high communication overhead in federated learning for decentralized tabular data, representing an incremental improvement over existing data augmentation methods.

The paper tackles the challenge of non-IID data in federated learning for tabular data by proposing Fed-TDA, a method that synthesizes data using simple statistics like column distributions and global covariance, resulting in improved test performance and communication efficiency over state-of-the-art methods on five real-world datasets.

Non-independent and identically distributed (non-IID) data is a key challenge in federated learning (FL), which usually hampers the optimization convergence and the performance of FL. Existing data augmentation methods based on federated generative models or raw data sharing strategies for solving the non-IID problem still suffer from low performance, privacy protection concerns, and high communication overhead in decentralized tabular data. To tackle these challenges, we propose a federated tabular data augmentation method, named Fed-TDA. The core idea of Fed-TDA is to synthesize tabular data for data augmentation using some simple statistics (e.g., distributions of each column and global covariance). Specifically, we propose the multimodal distribution transformation and inverse cumulative distribution mapping respectively synthesize continuous and discrete columns in tabular data from a noise according to the pre-learned statistics. Furthermore, we theoretically analyze that our Fed-TDA not only preserves data privacy but also maintains the distribution of the original data and the correlation between columns. Through extensive experiments on five real-world tabular datasets, we demonstrate the superiority of Fed-TDA over the state-of-the-art in test performance and communication efficiency.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes