LG CLDec 12, 2022

Federated Few-Shot Learning for Mobile NLP

Dongqi Cai, Shangguang Wang, Yaozong Wu, Felix Xiaozhu Lin, Mengwei Xu

Cambridge

arXiv:2212.05974v214.126 citationsh-index: 25Has Code

Originality Incremental advance

AI Analysis

It addresses the key blocker for mobile NLP applications by enabling federated learning with minimal labeled data, though it is incremental as it retrofits existing techniques like pseudo labeling and prompt learning.

This work tackles the problem of federated NLP in few-shot scenarios where mobile users lack labeled data, achieving competitive accuracy with only 0.05% labeled data and reducing training delay, client energy, and network traffic by up to 46.0x, 41.2x, and 3000.0x, respectively.

Natural language processing (NLP) sees rich mobile applications. To support various language understanding tasks, a foundation NLP model is often fine-tuned in a federated, privacy-preserving setting (FL). This process currently relies on at least hundreds of thousands of labeled training samples from mobile clients; yet mobile users often lack willingness or knowledge to label their data. Such an inadequacy of data labels is known as a few-shot scenario; it becomes the key blocker for mobile NLP applications. For the first time, this work investigates federated NLP in the few-shot scenario (FedFSL). By retrofitting algorithmic advances of pseudo labeling and prompt learning, we first establish a training pipeline that delivers competitive accuracy when only 0.05% (fewer than 100) of the training data is labeled and the remaining is unlabeled. To instantiate the workflow, we further present a system FeS, addressing the high execution cost with novel designs. (1) Curriculum pacing, which injects pseudo labels to the training workflow at a rate commensurate to the learning progress; (2) Representational diversity, a mechanism for selecting the most learnable data, only for which pseudo labels will be generated; (3) Co-planning of a model's training depth and layer capacity. Together, these designs reduce the training delay, client energy, and network traffic by up to 46.0$\times$, 41.2$\times$ and 3000.0$\times$, respectively. Through algorithm/system co-design, FFNLP demonstrates that FL can apply to challenging settings where most training samples are unlabeled.

View on arXiv PDF Code

Similar