CLAIMay 23, 2022

BBTv2: Towards a Gradient-Free Future with Large Language Models

arXiv:2205.11200v2320 citationsh-index: 70
Originality Incremental advance
AI Analysis

This addresses the need for efficient adaptation of large language models for practitioners, though it is incremental as it builds on prior gradient-free tuning work.

The paper tackles the problem of high tuning costs for large pre-trained models by proposing BBTv2, a gradient-free method that prepends continuous prompts to all layers and uses a divide-and-conquer algorithm, achieving comparable performance to full tuning and state-of-the-art parameter-efficient methods with fewer tunable parameters in few-shot settings.

Most downstream adaptation methods tune all or part of the parameters of pre-trained models (PTMs) through gradient descent, where the tuning cost increases linearly with the growth of the model size. By contrast, gradient-free methods only require the forward computation of the PTM to tune the prompt, retaining the benefits of efficient tuning and deployment. Though, past work on gradient-free tuning often introduces gradient descent to seek a good initialization of prompt and lacks versatility across tasks and PTMs. In this paper, we present BBTv2, an improved version of Black-Box Tuning, to drive PTMs for few-shot learning. We prepend continuous prompts to every layer of the PTM and propose a divide-and-conquer gradient-free algorithm to optimize the prompts at different layers alternately. Extensive experiments across various tasks and PTMs show that BBTv2 can achieve comparable performance to full model tuning and state-of-the-art parameter-efficient methods (e.g., Adapter, LoRA, BitFit, etc.) under few-shot settings while maintaining much fewer tunable parameters.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes