AdaSplit: Adaptive Trade-offs for Resource-constrained Distributed Deep Learning
This work addresses computational and bandwidth constraints for low-resource clients in distributed deep learning, such as mobile/IoT devices, with an incremental improvement over existing split learning methods.
The paper tackles the problem of resource-constrained distributed deep learning, where split learning reduces client compute but increases bandwidth and suffers from poor convergence with heterogeneous data, by introducing AdaSplit to reduce bandwidth consumption and improve performance, validated through experiments showing effectiveness under limited resources.
Distributed deep learning frameworks like federated learning (FL) and its variants are enabling personalized experiences across a wide range of web clients and mobile/IoT devices. However, FL-based frameworks are constrained by computational resources at clients due to the exploding growth of model parameters (eg. billion parameter model). Split learning (SL), a recent framework, reduces client compute load by splitting the model training between client and server. This flexibility is extremely useful for low-compute setups but is often achieved at cost of increase in bandwidth consumption and may result in sub-optimal convergence, especially when client data is heterogeneous. In this work, we introduce AdaSplit which enables efficiently scaling SL to low resource scenarios by reducing bandwidth consumption and improving performance across heterogeneous clients. To capture and benchmark this multi-dimensional nature of distributed deep learning, we also introduce C3-Score, a metric to evaluate performance under resource budgets. We validate the effectiveness of AdaSplit under limited resources through extensive experimental comparison with strong federated and split learning baselines. We also present a sensitivity analysis of key design choices in AdaSplit which validates the ability of AdaSplit to provide adaptive trade-offs across variable resource budgets.