CL LGMar 5, 2025

Enhancing LLM Knowledge Learning through Generalization

Mingkang Zhu, Xi Chen, Zhongdao Wang, Bei Yu, Hengshuang Zhao, Jiaya Jia

arXiv:2503.03705v22 citationsh-index: 12EMNLP

Originality Incremental advance

AI Analysis

This work addresses the problem of costly and unreliable knowledge integration for LLM developers, offering an incremental improvement over existing methods.

The paper tackles the challenge of integrating evolving factual knowledge into large language models by improving their generalization to diverse paraphrased contexts, resulting in enhanced knowledge acquisition through formatting-based data augmentation and sharpness-aware minimization.

As Large language models (LLMs) are increasingly deployed in diverse applications, faithfully integrating evolving factual knowledge into these models remains a critical challenge. Continued pre-training on paraphrased data has shown empirical promise for enhancing knowledge acquisition. However, this approach is often costly and unreliable, as it relies on external models or manual effort for rewriting, and may inadvertently alter the factual content. In this work, we hypothesize and empirically show that an LLM's ability to continually predict the same factual knowledge tokens given diverse paraphrased contexts is positively correlated with its capacity to extract that knowledge via question-answering. Based on this view and aiming to improve generalization to diverse paraphrased contexts, we introduce two strategies to enhance LLMs' ability to predict the same knowledge tokens given varied contexts, thereby enhancing knowledge acquisition. First, we propose formatting-based data augmentation, which diversifies documents conveying the same knowledge by altering document formats rather than their content, thereby preserving factual integrity. Second, we adopt sharpness-aware minimization as the optimizer to better improve generalization. Extensive experiments demonstrate our methods' effectiveness in both continued pre-training and instruction tuning, and further gains can be achieved by combining with paraphrased data.

View on arXiv PDF

Similar