Decoupling Knowledge and Task Subspaces for Composable Parametric Retrieval Augmented Generation
For researchers working on parametric retrieval-augmented generation, this work identifies and mitigates a key failure mode in adapter composition, though the improvement is incremental.
The paper addresses the entanglement of document-specific facts and task-solving behavior in Parametric Retrieval-Augmented Generation (PRAG) adapters, which degrades composition reliability. By introducing Orthogonal Subspace Decomposition (OSD) to separate task and knowledge subspaces, they show improved compositional robustness when merging multiple document adapters.
Parametric Retrieval-Augmented Generation (PRAG) encodes external documents into lightweight parameter modules that can be retrieved and merged at inference time, offering a promising alternative to in-context retrieval augmentation. Despite its potential, many PRAG implementations train document adapters with task-supervised objectives, which may cause each adapter to encode both document-specific facts and reusable task-solving behavior. This entanglement may make adapter composition less reliable: when multiple adapters are merged at inference time, their overlapping task behaviors can accumulate together with document-specific updates, potentially making the merged adapter less stable and less focused on the intended document knowledge. To examine this issue, we explore Orthogonal Subspace Decomposition (OSD), an adapter-training setup that separates reusable task behavior from document-specific knowledge adapters. Concretely, we first train a Task LoRA to capture reusable task behavior, and then train document LoRAs to encode document-specific knowledge in a orthogonal subspace. This setup provides a controlled way to examine how orthogonalizing task and document LoRA updates affects adapter composition in multi-document PRAG. Experiments across multiple knowledge-intensive tasks and model scales suggest that this orthogonalization strategy can improve compositional robustness in parametric RAG, especially when multiple document adapters are merged.