LG NI MLNov 28, 2018

Communication-Efficient On-Device Machine Learning: Federated Distillation and Augmentation under Non-IID Private Data

Eunjeong Jeong, Seungeun Oh, Hyesung Kim, Jihong Park, Mehdi Bennis, Seong-Lyun Kim

arXiv:1811.11479v234.4742 citations

Originality Incremental advance

AI Analysis

This addresses communication efficiency and data heterogeneity issues for on-device ML systems, offering an incremental improvement over existing federated learning methods.

The paper tackles the problem of high communication overhead and performance degradation due to non-IID data in on-device machine learning by proposing federated distillation (FD) and federated augmentation (FAug). The result is a 26x reduction in communication overhead while achieving 95-98% test accuracy compared to federated learning.

On-device machine learning (ML) enables the training process to exploit a massive amount of user-generated private data samples. To enjoy this benefit, inter-device communication overhead should be minimized. With this end, we propose federated distillation (FD), a distributed model training algorithm whose communication payload size is much smaller than a benchmark scheme, federated learning (FL), particularly when the model size is large. Moreover, user-generated data samples are likely to become non-IID across devices, which commonly degrades the performance compared to the case with an IID dataset. To cope with this, we propose federated augmentation (FAug), where each device collectively trains a generative model, and thereby augments its local data towards yielding an IID dataset. Empirical studies demonstrate that FD with FAug yields around 26x less communication overhead while achieving 95-98% test accuracy compared to FL.

View on arXiv PDF

Similar