DeepShovel: An Online Collaborative Platform for Data Extraction in Geoscience Literature with AI Assistance
It addresses the need for better data extraction tools for geoscientists and other researchers, though it is incremental as it builds on existing AI models.
The paper tackles the problem of extracting data from geoscience literature by introducing DeepShovel, an AI-assisted online platform that improves efficiency and encourages collaboration, as shown in a user evaluation with 14 researchers.
Geoscientists, as well as researchers in many fields, need to read a huge amount of literature to locate, extract, and aggregate relevant results and data to enable future research or to build a scientific database, but there is no existing system to support this use case well. In this paper, based on the findings of a formative study about how geoscientists collaboratively annotate literature and extract and aggregate data, we proposed DeepShovel, a publicly-available AI-assisted data extraction system to support their needs. DeepShovel leverages the state-of-the-art neural network models to support researcher(s) easily and accurately annotate papers (in the PDF format) and extract data from tables, figures, maps, etc. in a human-AI collaboration manner. A follow-up user evaluation with 14 researchers suggested DeepShovel improved users' efficiency of data extraction for building scientific databases, and encouraged teams to form a larger scale but more tightly-coupled collaboration.