A Medical Literature Search System for Identifying Effective Treatments in Precision Medicine
This addresses the challenge for physicians in finding effective treatments in precision medicine by improving search accuracy, though it is incremental as it builds on existing methods and benchmarks.
The paper tackles the problem of low recall and precision in medical literature search for precision medicine by proposing a retrieval system that expands disease and gene terms and uses a machine learning reranker, achieving performance comparable to top entries on the 2018 PM track leaderboard.
The Precision Medicine Initiative states that treatments for a patient should take into account not only the patient's disease, but his/her specific genetic variation as well. The vast biomedical literature holds the potential for physicians to identify effective treatment options for a cancer patient. However, the complexity and ambiguity of medical terms can result in vocabulary mismatch between the physician's query and the literature. The physician's search intent (finding treatments instead of other types of studies) is difficult to explicitly formulate in a query. Therefore, simple ad hot retrieval approach will suffer from low recall and precision. In this paper, we propose a new retrieval system that helps physicians identify effective treatments in precision medicine. Given a cancer patient with a specific disease, genetic variation, and demographic information, the system aims to identify biomedical publications that report effective treatments. We approach this goal from two directions. First, we expand the original disease and gene terms using biomedical knowledge bases to improve recall of the initial retrieval. We then improve precision by promoting treatment-related publications to the top using a machine learning reranker trained on 2017 Text Retrieval Conference Precision Medicine (PM) track corpus. Batch evaluation results on 2018 PM track corpus show that the proposed approach effectively improves both recall and precision, achieving performance comparable to the top entries on the leaderboard of 2018 PM track.