CL IRJan 24, 2019

Extracting PICO elements from RCT abstracts using 1-2gram analysis and multitask classification

Xia Yuan, Liao xiaoli, Li Shilei, Shi Qinwen, Wu Jinfa, Li Ke

arXiv:1901.08351v10.23 citationsHas Code

Originality Synthesis-oriented

AI Analysis

This work addresses the problem of automating evidence-based medicine summarization for medical researchers, but it is incremental as it builds on existing classification methods with feature engineering.

The study tackled extracting PICO elements from RCT abstracts by developing a multitask SVM classification model with 1-2gram TF-IDF features, achieving the best performance among tested models on the BioNLP 2018 dataset.

The core of evidence-based medicine is to read and analyze numerous papers in the medical literature on a specific clinical problem and summarize the authoritative answers to that problem. Currently, to formulate a clear and focused clinical problem, the popular PICO framework is usually adopted, in which each clinical problem is considered to consist of four parts: patient/problem (P), intervention (I), comparison (C) and outcome (O). In this study, we compared several classification models that are commonly used in traditional machine learning. Next, we developed a multitask classification model based on a soft-margin SVM with a specialized feature engineering method that combines 1-2gram analysis with TF-IDF analysis. Finally, we trained and tested several generic models on an open-source data set from BioNLP 2018. The results show that the proposed multitask SVM classification model based on 1-2gram TF-IDF features exhibits the best performance among the tested models.

View on arXiv PDF

Similar