LGAICLMLNov 25, 2024

Fundamental Limits of Prompt Tuning Transformers: Universality, Capacity and Efficiency

arXiv:2411.16525v218 citationsh-index: 14
Originality Incremental advance
AI Analysis

This work provides fundamental theoretical limits for practitioners designing prompt tuning methods, though it is incremental in extending known concepts to simplified transformers.

The paper tackles the statistical and computational limits of prompt tuning for single-head, single-layer transformers, proving that it is a universal approximator for sequence-to-sequence functions and identifying a phase transition in efficiency, with an exponential lower bound on prompt tokens for memorization and almost-linear time algorithms under certain conditions.

We investigate the statistical and computational limits of prompt tuning for transformer-based foundation models. Our key contributions are prompt tuning on \emph{single-head} transformers with only a \emph{single} self-attention layer: (i) is universal, and (ii) supports efficient (even almost-linear time) algorithms under the Strong Exponential Time Hypothesis (SETH). Statistically, we prove that prompt tuning on such simplest possible transformers are universal approximators for sequence-to-sequence Lipschitz functions. In addition, we provide an exponential-in-$dL$ and -in-$(1/ε)$ lower bound on the required soft-prompt tokens for prompt tuning to memorize any dataset with 1-layer, 1-head transformers. Computationally, we identify a phase transition in the efficiency of prompt tuning, determined by the norm of the \emph{soft-prompt-induced} keys and queries, and provide an upper bound criterion. Beyond this criterion, no sub-quadratic (efficient) algorithm for prompt tuning exists under SETH. Within this criterion, we showcase our theory by proving the existence of almost-linear time prompt tuning inference algorithms. These fundamental limits provide important necessary conditions for designing expressive and efficient prompt tuning methods for practitioners.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes