LGAIFAFeb 22, 2024

Prompting a Pretrained Transformer Can Be a Universal Approximator

arXiv:2402.14753v120 citationsh-index: 12ICML
Originality Highly original
AI Analysis

This provides foundational theoretical guarantees for fine-tuning methods in AI, showing their universal approximation capabilities, which is crucial for researchers and practitioners in machine learning.

The paper tackles the theoretical question of whether prompting or prefix-tuning a pretrained transformer can universally approximate any sequence-to-sequence function, and demonstrates that even small models can achieve this with a single attention head and depth linear in sequence length.

Despite the widespread adoption of prompting, prompt tuning and prefix-tuning of transformer models, our theoretical understanding of these fine-tuning methods remains limited. A key question is whether one can arbitrarily modify the behavior of pretrained model by prompting or prefix-tuning it. Formally, whether prompting and prefix-tuning a pretrained model can universally approximate sequence-to-sequence functions. This paper answers in the affirmative and demonstrates that much smaller pretrained models than previously thought can be universal approximators when prefixed. In fact, the attention mechanism is uniquely suited for universal approximation with prefix-tuning a single attention head being sufficient to approximate any continuous function. Moreover, any sequence-to-sequence function can be approximated by prefixing a transformer with depth linear in the sequence length. Beyond these density-type results, we also offer Jackson-type bounds on the length of the prefix needed to approximate a function to a desired precision.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes