Device Tuning for Multi-Task Large Model
This addresses efficiency limitations for deploying large models in multi-task scenarios, though it appears incremental as an optimization of existing distributed approaches.
The paper tackles the high computational and memory costs of pre-training and fine-tuning self-attention models for multi-task learning by proposing Device Tuning, a framework that optimizes cloud-device collaboration with representation compression to reduce communication overhead.
Unsupervised pre-training approaches have achieved great success in many fields such as Computer Vision (CV), Natural Language Processing (NLP) and so on. However, compared to typical deep learning models, pre-training or even fine-tuning the state-of-the-art self-attention models is extremely expensive, as they require much more computational and memory resources. It severely limits their applications and success in a variety of domains, especially for multi-task learning. To improve the efficiency, we propose Device Tuning for the efficient multi-task model, which is a massively multitask framework across the cloud and device and is designed to encourage learning of representations that generalize better to many different tasks. Specifically, we design Device Tuning architecture of a multi-task model that benefits both cloud modelling and device modelling, which reduces the communication between device and cloud by representation compression. Experimental results demonstrate the effectiveness of our proposed method.