CLFeb 26, 2024

Investigating the Effectiveness of HyperTuning via Gisting

arXiv:2402.16817v13 citationsh-index: 2
Originality Synthesis-oriented
AI Analysis

This work addresses the challenge of efficient hypernetwork training for few-shot learning, but it is incremental as it builds on existing Gisting methods and shows mixed performance.

The paper tackled the problem of training Transformer-based hypernetworks economically by introducing HyperLlama, a set of Gisting-based hypernetworks that generate task-specific soft prefixes from few-shot inputs, but found they underperform multi-task fine-tuned models with full attention over few-shot examples.

Gisting (Mu et al., 2023) is a simple method for training models to compress information into fewer token representations using a modified attention mask, and can serve as an economical approach to training Transformer-based hypernetworks. We introduce HyperLlama, a set of Gisting-based hypernetworks built on Llama-2 models that generates task-specific soft prefixes based on few-shot inputs. In experiments across P3, Super-NaturalInstructions and Symbol Tuning datasets, we show that HyperLlama models can effectively compress information from few-shot examples into soft prefixes. However, they still underperform multi-task fine-tuned language models with full attention over few-shot in-context examples. We also show that HyperLlama-generated soft prefixes can serve as better initializations for further prefix tuning. Overall, Gisting-based hypernetworks are economical and easy to implement, but have mixed empirical performance.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes