Benchmarking Clinical Decision Support Search
This work tackles the problem of inconsistent benchmarking in clinical literature search for medical researchers, but it is incremental as it builds on existing TREC track data without introducing new methods.
The paper addresses the challenge of comparing diverse search methods in clinical decision support due to platform variability, by using a stable platform to benchmark and reproduce leading teams' runs, enabling statistical hypothesis testing for further research.
Finding relevant literature underpins the practice of evidence-based medicine. From 2014 to 2016, TREC conducted a clinical decision support track, wherein participants were tasked with finding articles relevant to clinical questions posed by physicians. In total, 87 teams have participated over the past three years, generating 395 runs. During this period, each team has trialled a variety of methods. While there was significant overlap in the methods employed by different teams, the results were varied. Due to the diversity of the platforms used, the results arising from the different techniques are not directly comparable, reducing the ability to build on previous work. By using a stable platform, we have been able to compare different document and query processing techniques, allowing us to experiment with different search parameters. We have used our system to reproduce leading teams runs, and compare the results obtained. By benchmarking our indexing and search techniques, we can statistically test a variety of hypotheses, paving the way for further research.