LG CL DCDec 2, 2022

ColD Fusion: Collaborative Descent for Distributed Multitask Finetuning

Shachar Don-Yehiya, Elad Venezian, Colin Raffel, Noam Slonim, Yoav Katz, Leshem Choshen

IBM

arXiv:2212.01378v243.3249 citationsh-index: 59

Originality Incremental advance

AI Analysis

This addresses the challenge of efficient and scalable model improvement for AI practitioners, though it appears incremental as it builds on existing multitask and distributed training paradigms.

The paper tackles the problem of evolving pretrained models through distributed multitask finetuning without shared data, resulting in a model that outperforms RoBERTa by 2.33 points on average across 35 datasets and serves as a better starting point for unseen tasks.

We propose a new paradigm to continually evolve pretrained models, denoted ColD Fusion. It provides the benefits of multitask learning but leverages distributed computation with limited communication and eliminates the need for shared data. Consequentially, ColD Fusion can give rise to a synergistic loop, where finetuned models can be recycled to continually improve the pretrained model they are based upon. We show that ColD Fusion yields comparable benefits to multitask training by producing a model that (a) attains strong performance on all of the datasets it was trained on; and (b) is a better starting point for finetuning on unseen datasets. We show that ColD Fusion outperforms RoBERTa and even previous multitask models. Specifically, when training and testing on 35 diverse datasets, ColD Fusion-based model outperforms RoBERTa by 2.33 points on average without any changes to the architecture.

View on arXiv PDF

Similar