CLJul 23, 2024

CHIME: LLM-Assisted Hierarchical Organization of Scientific Studies for Literature Review Support

Chao-Chun Hsu, Erin Bransom, Jenna Sparks, Bailey Kuehl, Chenhao Tan, David Wadden, Lucy Lu Wang, Aakanksha Naik

AI2UW

arXiv:2407.16148v131 citationsh-index: 31

Originality Incremental advance

AI Analysis

This work addresses the problem of managing large volumes of scientific studies for researchers, though it is incremental as it builds on existing LLM capabilities with human feedback.

The paper tackles the challenge of organizing scientific literature for literature reviews by using LLMs to generate hierarchical structures, and finds that while LLMs are effective at creating categories, study assignment can be improved, with a corrector model boosting assignment performance by 12.6 F1 points.

Literature review requires researchers to synthesize a large amount of information and is increasingly challenging as the scientific literature expands. In this work, we investigate the potential of LLMs for producing hierarchical organizations of scientific studies to assist researchers with literature review. We define hierarchical organizations as tree structures where nodes refer to topical categories and every node is linked to the studies assigned to that category. Our naive LLM-based pipeline for hierarchy generation from a set of studies produces promising yet imperfect hierarchies, motivating us to collect CHIME, an expert-curated dataset for this task focused on biomedicine. Given the challenging and time-consuming nature of building hierarchies from scratch, we use a human-in-the-loop process in which experts correct errors (both links between categories and study assignment) in LLM-generated hierarchies. CHIME contains 2,174 LLM-generated hierarchies covering 472 topics, and expert-corrected hierarchies for a subset of 100 topics. Expert corrections allow us to quantify LLM performance, and we find that while they are quite good at generating and organizing categories, their assignment of studies to categories could be improved. We attempt to train a corrector model with human feedback which improves study assignment by 12.6 F1 points. We release our dataset and models to encourage research on developing better assistive tools for literature review.

View on arXiv PDF

Similar