CLMay 15, 2020

A Scientific Information Extraction Dataset for Nature Inspired Engineering

arXiv:2005.07753v21001 citations
AI Analysis

This dataset improves access to scientific biology information for engineers, facilitating interdisciplinary research, but it is incremental as it focuses on data creation rather than novel methods.

The paper tackles the problem of extracting domain-independent relations from scientific biology texts by introducing a manually-annotated dataset of 1,500 sentences, enabling training and evaluation of Relation Extraction algorithms for Nature Inspired Engineering.

Nature has inspired various ground-breaking technological developments in applications ranging from robotics to aerospace engineering and the manufacturing of medical devices. However, accessing the information captured in scientific biology texts is a time-consuming and hard task that requires domain-specific knowledge. Improving access for outsiders can help interdisciplinary research like Nature Inspired Engineering. This paper describes a dataset of 1,500 manually-annotated sentences that express domain-independent relations between central concepts in a scientific biology text, such as trade-offs and correlations. The arguments of these relations can be Multi Word Expressions and have been annotated with modifying phrases to form non-projective graphs. The dataset allows for training and evaluating Relation Extraction algorithms that aim for coarse-grained typing of scientific biological documents, enabling a high-level filter for engineers.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes