CLMTRL-SCIAIIRJun 8, 2024

Toward Reliable Ad-hoc Scientific Information Extraction: A Case Study on Two Materials Datasets

arXiv:2406.05348v39 citations
Originality Synthesis-oriented
AI Analysis

This addresses the problem of automating scientific information extraction for materials researchers, but it is incremental as it builds on existing datasets and methods.

The study tested GPT-4's ability to replicate two materials science datasets through basic prompting, finding it struggles with faithful extraction as identified by manual error analysis.

We explore the ability of GPT-4 to perform ad-hoc schema based information extraction from scientific literature. We assess specifically whether it can, with a basic prompting approach, replicate two existing material science datasets, given the manuscripts from which they were originally manually extracted. We employ materials scientists to perform a detailed manual error analysis to assess where the model struggles to faithfully extract the desired information, and draw on their insights to suggest research directions to address this broadly important task.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes