DSR: A Collection for the Evaluation of Graded Disease-Symptom Relations
This provides a benchmark for medical NLP tasks like diagnosis assistance, but it is incremental as it builds on existing methods with a new dataset.
The authors tackled the lack of a systematic evaluation collection for graded disease-symptom relationship extraction by introducing the DSR-collection, annotated by physicians, and showed that their adapted co-occurrence method improved performance with gains in nDCG, precision, and recall.
The effective extraction of ranked disease-symptom relationships is a critical component in various medical tasks, including computer-assisted medical diagnosis or the discovery of unexpected associations between diseases. While existing disease-symptom relationship extraction methods are used as the foundation in the various medical tasks, no collection is available to systematically evaluate the performance of such methods. In this paper, we introduce the Disease-Symptom Relation collection (DSR-collection), created by five fully trained physicians as expert annotators. We provide graded symptom judgments for diseases by differentiating between "symptoms" and "primary symptoms". Further, we provide several strong baselines, based on the methods used in previous studies. The first method is based on word embeddings, and the second on co-occurrences of keywords in medical articles. For the co-occurrence method, we propose an adaption in which not only keywords are considered, but also the full text of medical articles. The evaluation on the DSR-collection shows the effectiveness of the proposed adaption in terms of nDCG, precision, and recall.