SELF: Self-Extend the Context Length With Logistic Growth Function
This addresses a critical bottleneck for LLMs in handling long-context tasks, though it appears incremental as an extension of existing position encoding methods.
The paper tackles the problem of large language models struggling with contexts longer than their training length by proposing SELF, a method that groups tokens using a logistic growth function. The model achieved performance improvements of up to 12% on LEval, 6.4% on LongBench summarization, and 5.4% on LEval reading comprehension compared to LongLM.
Large language models suffer issues when operated on long contexts that are larger than their training context length due to the standard position encoding for tokens in the attention layer. Tokens a long distance apart will rarely have an effect on each other and long prompts yield unexpected results. To solve this problem, we propose SELF (Self-Extend the Context Length With Logistic Growth Function): a solution of grouping consecutive tokens at varying group sizes using a logistic capacity equation combined with a constant group size at smaller relative distances. Our model had an increase in performance of up to 12% compared to the LongLM extension method in LEval (specifically on the Qwen model). On summarization related tasks in LongBench, our model performed up to 6.4% better than LongLM (specifically on the Llama-2-7b model). On reading comprehension tasks from LEval, our model performed up to 5.4% better than the LongLM. Our code is available at https://github.com/alexeipc/SELF-LLM.