CLApr 21, 2025

Transparentize the Internal and External Knowledge Utilization in LLMs with Trustworthy Citation

Jiajun Shen, Tong Zhou, Yubo Chen, Delai Qiu, Shengping Liu, Kang Liu, Jun Zhao

arXiv:2504.14856v14 citationsh-index: 26ACL

Originality Incremental advance

AI Analysis

This work addresses the issue of trustworthiness in citation generation for users of large language models, but it appears incremental as it builds on existing retrieval-augmented generation and citation methods.

The paper tackles the problem of opaque internal knowledge utilization and untrustworthy citations in large language models by introducing the Context-Prior Augmented Citation Generation task, which requires models to generate citations based on both external and internal knowledge, and shows that their method achieves better cross-scenario performance compared to baselines.

While hallucinations of large language models could been alleviated through retrieval-augmented generation and citation generation, how the model utilizes internal knowledge is still opaque, and the trustworthiness of its generated answers remains questionable. In this work, we introduce Context-Prior Augmented Citation Generation task, requiring models to generate citations considering both external and internal knowledge while providing trustworthy references, with 5 evaluation metrics focusing on 3 aspects: answer helpfulness, citation faithfulness, and trustworthiness. We introduce RAEL, the paradigm for our task, and also design INTRALIGN, an integrated method containing customary data generation and an alignment algorithm. Our experimental results show that our method achieves a better cross-scenario performance with regard to other baselines. Our extended experiments further reveal that retrieval quality, question types, and model knowledge have considerable influence on the trustworthiness in citation generation.

View on arXiv PDF

Similar