LGAIMar 11, 2025

Training Plug-n-Play Knowledge Modules with Deep Context Distillation

arXiv:2503.08727v417 citationsh-index: 35
Originality Incremental advance
AI Analysis

This addresses the problem of dynamic knowledge integration for users of large language models, offering a modular approach that is incremental in improving training objectives.

The paper tackles the challenge of integrating new or evolving information into large language models after pre-training, particularly in low-data or private document scenarios, by proposing document-level Knowledge Modules (KMs) trained with Deep Context Distillation, which outperforms standard methods across two datasets.

Dynamically integrating new or rapidly evolving information after (Large) Language Model pre-training remains challenging, particularly in low-data scenarios or when dealing with private and specialized documents. In-context learning and retrieval-augmented generation (RAG) face limitations, including their high inference costs and their inability to capture global document information. In this paper, we propose a way of modularizing knowledge by training document-level Knowledge Modules (KMs). KMs are lightweight components implemented as parameter-efficient LoRA modules, which are trained to store information about new documents and can be easily plugged into models on demand. We show that next-token prediction performs poorly as the training objective for KMs. We instead propose Deep Context Distillation: we learn KMs parameters such as to simulate hidden states and logits of a teacher that takes the document in context. Our method outperforms standard next-token prediction and pre-instruction training techniques, across two datasets. Finally, we highlight synergies between KMs and RAG.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes