CLAINov 8, 2023

LongQLoRA: Efficient and Effective Method to Extend Context Length of Large Language Models

arXiv:2311.04879v29 citationsh-index: 11Has Code
AI Analysis

This addresses the need for longer context windows in LLMs for applications like long-form text generation, though it is incremental as it builds on existing methods like Position Interpolation and QLoRA.

The authors tackled the problem of extending the context length of large language models efficiently, achieving an extension from 4096 to up to 12k tokens for LLaMA2 models with only 1000 finetuning steps on a single 32GB V100 GPU, while maintaining competitive perplexity on benchmarks like PG19 and Proof-pile.

We present LongQLoRA, an efficient and effective method to extend context length of large language models with less training resources. LongQLoRA combines the advantages of Position Interpolation, QLoRA and Shift Short Attention of LongLoRA. With a single 32GB V100 GPU, LongQLoRA can extend the context length of LLaMA2 7B and 13B from 4096 to 8192 and even to 12k within 1000 finetuning steps. LongQLoRA achieves competitive perplexity performance on PG19 and Proof-pile datasets, our model outperforms LongLoRA and is very close to MPT-7B-8K within the evaluation context length of 8192. We collect and build 39k long instruction data to extend context length of Vicuna-13B from 4096 to 8192 and achieve good performance both in long and short context generation task. We also do some ablation experiments to study the effect of LoRA rank, finetuning steps and attention patterns in inference.The model weights, training data and code are avaliable at https://github.com/yangjianxin1/LongQLoRA.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes