DBCLLGPEJun 11, 2014

A machine-compiled macroevolutionary history of Phanerozoic life

arXiv:1406.2963v28 citations
Originality Incremental advance
AI Analysis

This addresses the need for more complete and scalable data integration in paleontology, enabling new research questions, though it is incremental in applying existing machine reading techniques to this domain.

The authors tackled the problem of incomplete and hard-to-enhance manually assembled paleontological databases by developing PaleoDeepDive, a machine reading system that automatically extracts data from publications, performing comparably to humans and generating congruent synthetic macroevolutionary results.

Many aspects of macroevolutionary theory and our understanding of biotic responses to global environmental change derive from literature-based compilations of palaeontological data. Existing manually assembled databases are, however, incomplete and difficult to assess and enhance. Here, we develop and validate the quality of a machine reading system, PaleoDeepDive, that automatically locates and extracts data from heterogeneous text, tables, and figures in publications. PaleoDeepDive performs comparably to humans in complex data extraction and inference tasks and generates congruent synthetic macroevolutionary results. Unlike traditional databases, PaleoDeepDive produces a probabilistic database that systematically improves as information is added. We also show that the system can readily accommodate sophisticated data types, such as morphological data in biological illustrations and associated textual descriptions. Our machine reading approach to scientific data integration and synthesis brings within reach many questions that are currently underdetermined and does so in ways that may stimulate entirely new modes of inquiry.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes