LGAICLOct 24, 2023

Improving generalization in large language models by learning prefix subspaces

arXiv:2310.15793v13 citationsh-index: 4Has Code
Originality Incremental advance
AI Analysis

This work addresses the challenge of enhancing LLM generalization in data-scarce settings, which is incremental as it adapts an existing method from computer vision to LLMs with modifications for compatibility.

The paper tackles the problem of improving generalization in large language models (LLMs) during few-shot fine-tuning by adapting a neural network subspace optimization method to LLMs through parameter-efficient fine-tuning (PEFT) and learning continuous prefix subspaces, resulting in a gain in average performance on a few-shot GLUE benchmark compared to state-of-the-art methods.

This article focuses on large language models (LLMs) fine-tuning in the scarce data regime (also known as the "few-shot" learning setting). We propose a method to increase the generalization capabilities of LLMs based on neural network subspaces. This optimization method, recently introduced in computer vision, aims to improve model generalization by identifying wider local optima through the joint optimization of an entire simplex of models in parameter space. Its adaptation to massive, pretrained transformers, however, poses some challenges. First, their considerable number of parameters makes it difficult to train several models jointly, and second, their deterministic parameter initialization schemes make them unfit for the subspace method as originally proposed. We show in this paper that "Parameter Efficient Fine-Tuning" (PEFT) methods, however, are perfectly compatible with this original approach, and propose to learn entire simplex of continuous prefixes. We test our method on a variant of the GLUE benchmark adapted to the few-shot learning setting, and show that both our contributions jointly lead to a gain in average performances compared to sota methods. The implementation can be found at the following link: https://github.com/Liloulou/prefix_subspace

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes