A Comparative Analysis of Conversational Large Language Models in Knowledge-Based Text Generation
This work addresses the issue of hallucination and inaccuracies in conversational AI for knowledge-based text generation, but it is incremental as it compares existing models and techniques.
The study tackled the problem of generating natural language text from semantic triples using conversational large language models, finding that few-shot prompting, post-processing, and fine-tuning significantly improve performance, especially for smaller models with lower zero-shot capabilities.
Generating natural language text from graph-structured data is essential for conversational information seeking. Semantic triples derived from knowledge graphs can serve as a valuable source for grounding responses from conversational agents by providing a factual basis for the information they communicate. This is especially relevant in the context of large language models, which offer great potential for conversational interaction but are prone to hallucinating, omitting, or producing conflicting information. In this study, we conduct an empirical analysis of conversational large language models in generating natural language text from semantic triples. We compare four large language models of varying sizes with different prompting techniques. Through a series of benchmark experiments on the WebNLG dataset, we analyze the models' performance and identify the most common issues in the generated predictions. Our findings show that the capabilities of large language models in triple verbalization can be significantly improved through few-shot prompting, post-processing, and efficient fine-tuning techniques, particularly for smaller models that exhibit lower zero-shot performance.