CLFeb 28, 2024

WIKIGENBENCH: Exploring Full-length Wikipedia Generation under Real-World Scenario

arXiv:2402.18264v221 citationsh-index: 12COLING
AI Analysis

This addresses the problem of automated Wikipedia article generation for new events, which is incremental as it builds on existing RAG frameworks with a new benchmark.

The paper tackles the challenge of generating comprehensive and accurate full-length Wikipedia articles for new events under real-world scenarios, constructing WIKIGENBENCH with 1,320 entries and showing that hierarchical-based methods improve comprehensiveness while fine-tuned methods enhance verifiability, but a significant gap remains compared to existing Wikipedia content.

It presents significant challenges to generate comprehensive and accurate Wikipedia articles for newly emerging events under a real-world scenario. Existing attempts fall short either by focusing only on short snippets or by using metrics that are insufficient to evaluate real-world scenarios. In this paper, we construct WIKIGENBENCH, a new benchmark consisting of 1,320 entries, designed to align with real-world scenarios in both generation and evaluation. For generation, we explore a real-world scenario where structured, full-length Wikipedia articles with citations are generated for new events using input documents from web sources. For evaluation, we integrate systematic metrics and LLM-based metrics to assess the verifiability, organization, and other aspects aligned with real-world scenarios. Based on this benchmark, we conduct extensive experiments using various models within three commonly used frameworks: direct RAG, hierarchical structure-based RAG, and RAG with a fine-tuned generation model. Experimental results show that hierarchical-based methods can generate more comprehensive content, while fine-tuned methods achieve better verifiability. However, even the best methods still show a significant gap compared to existing Wikipedia content, indicating that further research is necessary.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes