LG CVJul 8, 2022

StatMix: Data augmentation method that relies on image statistics in federated learning

Dominik Lewy, Jacek Mańdziuk, Maria Ganzha, Marcin Paprzycki

arXiv:2207.04103v16.910 citationsh-index: 31

Originality Synthesis-oriented

AI Analysis

This addresses data privacy concerns for companies using federated learning, though it is incremental as it builds on existing augmentation techniques.

The authors tackled the problem of data scarcity and privacy in federated learning by proposing StatMix, a data augmentation method based on image statistics, which improved average accuracy in FL experiments on CIFAR-10 and CIFAR-100 compared to baseline training.

Availability of large amount of annotated data is one of the pillars of deep learning success. Although numerous big datasets have been made available for research, this is often not the case in real life applications (e.g. companies are not able to share data due to GDPR or concerns related to intellectual property rights protection). Federated learning (FL) is a potential solution to this problem, as it enables training a global model on data scattered across multiple nodes, without sharing local data itself. However, even FL methods pose a threat to data privacy, if not handled properly. Therefore, we propose StatMix, an augmentation approach that uses image statistics, to improve results of FL scenario(s). StatMix is empirically tested on CIFAR-10 and CIFAR-100, using two neural network architectures. In all FL experiments, application of StatMix improves the average accuracy, compared to the baseline training (with no use of StatMix). Some improvement can also be observed in non-FL setups.

View on arXiv PDF

Similar