Experiment Segmentation in Scientific Discourse as Clause-level Structured Prediction using Recurrent Neural Networks
This work addresses information extraction from scientific literature, which could assist researchers in efficiently navigating experimental details, though it appears incremental as it builds on existing sequence labeling and attention methods.
The authors tackled the problem of identifying structure in experiment narratives from scientific literature by developing a clause-level sequence labeling model using recurrent neural networks with a novel attention mechanism, achieving performance improvements over baseline LSTM and CRF models.
We propose a deep learning model for identifying structure within experiment narratives in scientific literature. We take a sequence labeling approach to this problem, and label clauses within experiment narratives to identify the different parts of the experiment. Our dataset consists of paragraphs taken from open access PubMed papers labeled with rhetorical information as a result of our pilot annotation. Our model is a Recurrent Neural Network (RNN) with Long Short-Term Memory (LSTM) cells that labels clauses. The clause representations are computed by combining word representations using a novel attention mechanism that involves a separate RNN. We compare this model against LSTMs where the input layer has simple or no attention and a feature rich CRF model. Furthermore, we describe how our work could be useful for information extraction from scientific literature.