What Would it Take to get Biomedical QA Systems into Practice?
This work aims to solve the problem of low trust and adoption of QA systems among clinicians, but it is incremental as it focuses on criteria assessment rather than introducing new methods.
The paper addresses the lack of clinical adoption of biomedical question answering (QA) systems by identifying trust and transparency issues as key barriers, proposing criteria to improve utility and assessing existing models to guide future development.
Medical question answering (QA) systems have the potential to answer clinicians uncertainties about treatment and diagnosis on demand, informed by the latest evidence. However, despite the significant progress in general QA made by the NLP community, medical QA systems are still not widely used in clinical environments. One likely reason for this is that clinicians may not readily trust QA system outputs, in part because transparency, trustworthiness, and provenance have not been key considerations in the design of such models. In this paper we discuss a set of criteria that, if met, we argue would likely increase the utility of biomedical QA systems, which may in turn lead to adoption of such systems in practice. We assess existing models, tasks, and datasets with respect to these criteria, highlighting shortcomings of previously proposed approaches and pointing toward what might be more usable QA systems.