CLAIMar 10, 2025

DeFine: A Decomposed and Fine-Grained Annotated Dataset for Long-form Article Generation

arXiv:2503.07170v11 citationsh-index: 2
Originality Incremental advance
AI Analysis

This work addresses challenges in long-form article generation for researchers and practitioners by providing a new dataset, though it is incremental as it builds on existing methods with enhanced annotations.

The authors tackled the problem of long-form article generation by introducing DeFine, a dataset with hierarchical decomposition and fine-grained annotations, which led to significant improvements in text quality, including topic coverage, depth, and content fidelity when fine-tuning the Qwen2-7b-Instruct model.

Long-form article generation (LFAG) presents challenges such as maintaining logical consistency, comprehensive topic coverage, and narrative coherence across extended articles. Existing datasets often lack both the hierarchical structure and fine-grained annotation needed to effectively decompose tasks, resulting in shallow, disorganized article generation. To address these limitations, we introduce DeFine, a Decomposed and Fine-grained annotated dataset for long-form article generation. DeFine is characterized by its hierarchical decomposition strategy and the integration of domain-specific knowledge with multi-level annotations, ensuring granular control and enhanced depth in article generation. To construct the dataset, a multi-agent collaborative pipeline is proposed, which systematically segments the generation process into four parts: Data Miner, Cite Retreiver, Q&A Annotator and Data Cleaner. To validate the effectiveness of DeFine, we designed and tested three LFAG baselines: the web retrieval, the local retrieval, and the grounded reference. We fine-tuned the Qwen2-7b-Instruct model using the DeFine training dataset. The experimental results showed significant improvements in text quality, specifically in topic coverage, depth of information, and content fidelity. Our dataset publicly available to facilitate future research.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes