CL DLNov 3, 2018

Unsupervised Identification of Study Descriptors in Toxicology Research: An Experimental Study

Drahomira Herrmannova, Steven R. Young, Robert M. Patton, Christopher G. Stahl, Nicole C. Kleinstreuer, Mary S. Wolfe

arXiv:1811.01183v131.91089 citations

Originality Incremental advance

AI Analysis

This work addresses a labor-intensive data extraction task for researchers in toxicology, but it is incremental as it builds on existing unsupervised techniques.

The paper tackled the problem of manually extracting study descriptors from toxicology research publications by developing an unsupervised method to identify relevant text segments, resulting in improved binary classifier performance when trained on these segments compared to random sentences.

Identifying and extracting data elements such as study descriptors in publication full texts is a critical yet manual and labor-intensive step required in a number of tasks. In this paper we address the question of identifying data elements in an unsupervised manner. Specifically, provided a set of criteria describing specific study parameters, such as species, route of administration, and dosing regimen, we develop an unsupervised approach to identify text segments (sentences) relevant to the criteria. A binary classifier trained to identify publications that met the criteria performs better when trained on the candidate sentences than when trained on sentences randomly picked from the text, supporting the intuition that our method is able to accurately identify study descriptors.

View on arXiv PDF

Similar