LG AIMar 11, 2025

Training Plug-n-Play Knowledge Modules with Deep Context Distillation

Lucas Caccia, Alan Ansell, Edoardo Ponti, Ivan Vulić, Alessandro Sordoni

arXiv:2503.08727v422.019 citationsh-index: 42

Originality Incremental advance

AI Analysis

This addresses the problem of dynamic knowledge integration for users of large language models, offering a modular approach that is incremental in improving training objectives.

The paper tackles the challenge of integrating new or evolving information into large language models after pre-training, particularly in low-data or private document scenarios, by proposing document-level Knowledge Modules (KMs) trained with Deep Context Distillation, which outperforms standard methods across two datasets.

Dynamically integrating new or rapidly evolving information after (Large) Language Model pre-training remains challenging, particularly in low-data scenarios or when dealing with private and specialized documents. In-context learning and retrieval-augmented generation (RAG) face limitations, including their high inference costs and their inability to capture global document information. In this paper, we propose a way of modularizing knowledge by training document-level Knowledge Modules (KMs). KMs are lightweight components implemented as parameter-efficient LoRA modules, which are trained to store information about new documents and can be easily plugged into models on demand. We show that next-token prediction performs poorly as the training objective for KMs. We instead propose Deep Context Distillation: we learn KMs parameters such as to simulate hidden states and logits of a teacher that takes the document in context. Our method outperforms standard next-token prediction and pre-instruction training techniques, across two datasets. Finally, we highlight synergies between KMs and RAG.

View on arXiv PDF

Similar