CLJun 12, 2020

Information Extraction of Clinical Trial Eligibility Criteria

Yitong Tseo, M. I. Salkola, Ahmed Mohamed, Anuj Kumar, Freddy Abnousi

arXiv:2006.07296v61.430 citationsHas Code

Originality Incremental advance

AI Analysis

This work addresses the challenge of automating eligibility determination for clinical trials, which is incremental as it builds on existing extraction methods but introduces specific technical innovations.

The paper tackles the problem of extracting clinical trial eligibility criteria from unstructured text by formalizing them into a computer-interpretable format, using a novel knowledge base population approach with machine learning and context-free grammar, and reports competitive performance with the state-of-the-art system Criteria2Query.

Clinical trials predicate subject eligibility on a diversity of criteria ranging from patient demographics to food allergies. Trials post their requirements as semantically complex, unstructured free-text. Formalizing trial criteria to a computer-interpretable syntax would facilitate eligibility determination. In this paper, we investigate an information extraction (IE) approach for grounding criteria from trials in ClinicalTrials(dot)gov to a shared knowledge base. We frame the problem as a novel knowledge base population task, and implement a solution combining machine learning and context free grammar. To our knowledge, this work is the first criteria extraction system to apply attention-based conditional random field architecture for named entity recognition (NER), and word2vec embedding clustering for named entity linking (NEL). We release the resources and core components of our system on GitHub at https://github.com/facebookresearch/Clinical-Trial-Parser. Finally, we report our per module and end to end performances; we conclude that our system is competitive with Criteria2Query, which we view as the current state-of-the-art in criteria extraction.

View on arXiv PDF Code

Similar