LG CR CV DCJul 1, 2022

Visual Transformer Meets CutMix for Improved Accuracy, Communication Efficiency, and Data Privacy in Split Learning

Sihun Baek, Jihong Park, Praneeth Vepakomma, Ramesh Raskar, Mehdi Bennis, Seong-Lyun Kim

arXiv:2207.00234v19.613 citationsh-index: 85

Originality Incremental advance

AI Analysis

This work addresses scalability and privacy issues in distributed learning for large visual transformer models, offering an incremental improvement over existing split learning techniques.

The paper tackles the communication inefficiency and privacy risks of using split learning with visual transformers by proposing CutMixSL, which compresses and augments smashed data, achieving reduced communication costs and improved accuracy over baseline methods.

This article seeks for a distributed learning solution for the visual transformer (ViT) architectures. Compared to convolutional neural network (CNN) architectures, ViTs often have larger model sizes, and are computationally expensive, making federated learning (FL) ill-suited. Split learning (SL) can detour this problem by splitting a model and communicating the hidden representations at the split-layer, also known as smashed data. Notwithstanding, the smashed data of ViT are as large as and as similar as the input data, negating the communication efficiency of SL while violating data privacy. To resolve these issues, we propose a new form of CutSmashed data by randomly punching and compressing the original smashed data. Leveraging this, we develop a novel SL framework for ViT, coined CutMixSL, communicating CutSmashed data. CutMixSL not only reduces communication costs and privacy leakage, but also inherently involves the CutMix data augmentation, improving accuracy and scalability. Simulations corroborate that CutMixSL outperforms baselines such as parallelized SL and SplitFed that integrates FL with SL.

View on arXiv PDF

Similar