Measurement Context Extraction from Text: Discovering Opportunities and Gaps in Earth Science
This work addresses the challenge of efficiently analyzing quantitative discussions in large text corpora for Earth scientists, enabling accelerated research and more efficient scientific investment, though it appears incremental as it builds on existing extraction techniques.
The authors tackled the problem of extracting measurement values, units, and related context from natural language text in Earth science, proposing Marve, a system that uses CRF and rule-based methods to achieve high-precision extractions with strong recall, as demonstrated in refining measurement requirements for NASA's HyspIRI mission.
We propose Marve, a system for extracting measurement values, units, and related words from natural language text. Marve uses conditional random fields (CRF) to identify measurement values and units, followed by a rule-based system to find related entities, descriptors and modifiers within a sentence. Sentence tokens are represented by an undirected graphical model, and rules are based on part-of-speech and word dependency patterns connecting values and units to contextual words. Marve is unique in its focus on measurement context and early experimentation demonstrates Marve's ability to generate high-precision extractions with strong recall. We also discuss Marve's role in refining measurement requirements for NASA's proposed HyspIRI mission, a hyperspectral infrared imaging satellite that will study the world's ecosystems. In general, our work with HyspIRI demonstrates the value of semantic measurement extractions in characterizing quantitative discussion contained in large corpuses of natural language text. These extractions accelerate broad, cross-cutting research and expose scientists new algorithmic approaches and experimental nuances. They also facilitate identification of scientific opportunities enabled by HyspIRI leading to more efficient scientific investment and research.