CL AINov 6, 2024

QUILL: Quotation Generation Enhancement of Large Language Models

Jin Xiao, Bowei Zhang, Qianyu He, Jiaqing Liang, Feng Wei, Jinglei Chen, Zujie Liang, Deqing Yang, Yanghua Xiao

arXiv:2411.03675v22.71 citationsh-index: 22Has Code

Originality Incremental advance

AI Analysis

This addresses the issue of LLMs hallucinating or underperforming in quotation generation for writing assistance, though it is incremental as it builds on existing retrieval and ranking methods.

The paper tackled the problem of large language models (LLMs) struggling with quotation generation by establishing an automatic evaluation system and constructing a bilingual knowledge base with 32,022 quotes, showing that their metrics strongly correlate with human preferences.

While Large language models (LLMs) have become excellent writing assistants, they still struggle with quotation generation. This is because they either hallucinate when providing factual quotations or fail to provide quotes that exceed human expectations. To bridge the gap, we systematically study how to evaluate and improve LLMs' performance in quotation generation tasks. We first establish a holistic and automatic evaluation system for quotation generation task, which consists of five criteria each with corresponding automatic metric. To improve the LLMs' quotation generation abilities, we construct a bilingual knowledge base that is broad in scope and rich in dimensions, containing up to 32,022 quotes. Moreover, guided by our critiria, we further design a quotation-specific metric to rerank the retrieved quotations from the knowledge base. Extensive experiments show that our metrics strongly correlate with human preferences. Existing LLMs struggle to generate desired quotes, but our quotation knowledge base and reranking metric help narrow this gap. Our dataset and code are publicly available at https://github.com/GraceXiaoo/QUILL.

View on arXiv PDF Code

Similar