CLOct 24, 2023

MuLMS: A Multi-Layer Annotated Text Corpus for Information Extraction in the Materials Science Domain

arXiv:2310.15569v1125 citationsh-index: 4
Originality Synthesis-oriented
AI Analysis

This provides a foundational dataset for researchers in materials science to improve information extraction, though it is incremental as it builds on prior domain-specific resources.

The authors tackled the lack of comprehensive datasets for information extraction in materials science by introducing MuLMS, a multi-layer annotated corpus of 50 articles across seven sub-domains, and demonstrated that multi-task training with existing resources yields competitive neural models for all tasks.

Keeping track of all relevant recent publications and experimental results for a research area is a challenging task. Prior work has demonstrated the efficacy of information extraction models in various scientific areas. Recently, several datasets have been released for the yet understudied materials science domain. However, these datasets focus on sub-problems such as parsing synthesis procedures or on sub-domains, e.g., solid oxide fuel cells. In this resource paper, we present MuLMS, a new dataset of 50 open-access articles, spanning seven sub-domains of materials science. The corpus has been annotated by domain experts with several layers ranging from named entities over relations to frame structures. We present competitive neural models for all tasks and demonstrate that multi-task training with existing related resources leads to benefits.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes