LGAIJan 27

Decoupled Split Learning via Auxiliary Loss

arXiv:2601.19261v1h-index: 6
Originality Incremental advance
AI Analysis

This work addresses efficiency issues in distributed training for privacy-preserving machine learning, though it is incremental as it builds on existing split learning paradigms.

The paper tackles the high communication and memory overhead in traditional split learning by proposing a decoupled training method that uses local loss signals instead of end-to-end backpropagation, achieving performance on par with standard methods while reducing communication by 50% and peak memory usage by up to 58% on CIFAR datasets.

Split learning is a distributed training paradigm where a neural network is partitioned between clients and a server, which allows data to remain at the client while only intermediate activations are shared. Traditional split learning relies on end-to-end backpropagation across the client-server split point. This incurs a large communication overhead (i.e., forward activations and backward gradients need to be exchanged every iteration) and significant memory use (for storing activations and gradients). In this paper, we develop a beyond-backpropagation training method for split learning. In this approach, the client and server train their model partitions semi-independently, using local loss signals instead of propagated gradients. In particular, the client's network is augmented with a small auxiliary classifier at the split point to provide a local error signal, while the server trains on the client's transmitted activations using the true loss function. This decoupling removes the need to send backward gradients, which cuts communication costs roughly in half and also reduces memory overhead (as each side only stores local activations for its own backward pass). We evaluate our approach on CIFAR-10 and CIFAR-100. Our experiments show two key results. First, the proposed approach achieves performance on par with standard split learning that uses backpropagation. Second, it significantly reduces communication (of transmitting activations/gradient) by 50% and peak memory usage by up to 58%.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes