Hierarchical Catalogue Generation for Literature Review: A Benchmark
This addresses the problem of generating logical hierarchies in literature reviews for researchers, but it is incremental as it focuses on a specific step in review generation.
The paper introduces a new task, Hierarchical Catalogue Generation for Literature Review, to organize reference papers into a structured hierarchy, and constructs a dataset with 7.6k catalogues and 389k reference papers, benchmarking models like BART and ChatGPT with designed evaluation metrics.
Scientific literature review generation aims to extract and organize important information from an abundant collection of reference papers and produces corresponding reviews while lacking a clear and logical hierarchy. We observe that a high-quality catalogue-guided generation process can effectively alleviate this problem. Therefore, we present an atomic and challenging task named Hierarchical Catalogue Generation for Literature Review as the first step for review generation, which aims to produce a hierarchical catalogue of a review paper given various references. We construct a novel English Hierarchical Catalogues of Literature Reviews Dataset with 7.6k literature review catalogues and 389k reference papers. To accurately assess the model performance, we design two evaluation metrics for informativeness and similarity to ground truth from semantics and structure.Our extensive analyses verify the high quality of our dataset and the effectiveness of our evaluation metrics. We further benchmark diverse experiments on state-of-the-art summarization models like BART and large language models like ChatGPT to evaluate their capabilities. We further discuss potential directions for this task to motivate future research.