CL AIMay 23, 2022

BBTv2: Towards a Gradient-Free Future with Large Language Models

Tianxiang Sun, Zhengfu He, Hong Qian, Yunhua Zhou, Xuanjing Huang, Xipeng Qiu

arXiv:2205.11200v225.7320 citationsh-index: 70Has Code

Originality Incremental advance

AI Analysis

This addresses the need for efficient adaptation of large language models for practitioners, though it is incremental as it builds on prior gradient-free tuning work.

The paper tackles the problem of high tuning costs for large pre-trained models by proposing BBTv2, a gradient-free method that prepends continuous prompts to all layers and uses a divide-and-conquer algorithm, achieving comparable performance to full tuning and state-of-the-art parameter-efficient methods with fewer tunable parameters in few-shot settings.

Most downstream adaptation methods tune all or part of the parameters of pre-trained models (PTMs) through gradient descent, where the tuning cost increases linearly with the growth of the model size. By contrast, gradient-free methods only require the forward computation of the PTM to tune the prompt, retaining the benefits of efficient tuning and deployment. Though, past work on gradient-free tuning often introduces gradient descent to seek a good initialization of prompt and lacks versatility across tasks and PTMs. In this paper, we present BBTv2, an improved version of Black-Box Tuning, to drive PTMs for few-shot learning. We prepend continuous prompts to every layer of the PTM and propose a divide-and-conquer gradient-free algorithm to optimize the prompts at different layers alternately. Extensive experiments across various tasks and PTMs show that BBTv2 can achieve comparable performance to full model tuning and state-of-the-art parameter-efficient methods (e.g., Adapter, LoRA, BitFit, etc.) under few-shot settings while maintaining much fewer tunable parameters.

View on arXiv PDF Code

Similar