CL AIFeb 13

Doc-to-LoRA: Learning to Instantly Internalize Contexts

Rujikorn Charakorn, Edoardo Cetin, Shinnosuke Uesaka, Robert Tjarko Lange

arXiv:2602.15902v15.022 citationsh-index: 4

Originality Highly original

AI Analysis

This addresses the bottleneck of inefficient long-context processing in LLMs for applications like document understanding and reasoning, offering a practical solution for rapid adaptation.

The paper tackles the problem of high memory and latency costs in LLMs for long input sequences by proposing Doc-to-LoRA, a hypernetwork that meta-learns to generate LoRA adapters in a single forward pass, reducing inference latency and memory consumption. It achieves near-perfect zero-shot accuracy on a needle-in-a-haystack task with sequences over 4x the native context window and outperforms standard context distillation on real-world QA datasets.

Long input sequences are central to in-context learning, document understanding, and multi-step reasoning of Large Language Models (LLMs). However, the quadratic attention cost of Transformers makes inference memory-intensive and slow. While context distillation (CD) can transfer information into model parameters, per-prompt distillation is impractical due to training costs and latency. To address these limitations, we propose Doc-to-LoRA (D2L), a lightweight hypernetwork that meta-learns to perform approximate CD within a single forward pass. Given an unseen prompt, D2L generates a LoRA adapter for a target LLM, enabling subsequent queries to be answered without re-consuming the original context, reducing latency and KV-cache memory consumption during inference of the target LLM. On a long-context needle-in-a-haystack task, D2L successfully learns to map contexts into adapters that store the needle information, achieving near-perfect zero-shot accuracy at sequence lengths exceeding the target LLM's native context window by more than 4x. On real-world QA datasets with limited compute, D2L outperforms standard CD while significantly reducing peak memory consumption and update latency. We envision that D2L can facilitate rapid adaptation of LLMs, opening up the possibility of frequent knowledge updates and personalized chat behavior.

View on arXiv PDF

Similar