CLOct 2, 2022

Risk-graded Safety for Handling Medical Queries in Conversational AI

arXiv:2210.00572v1300 citationsh-index: 38
Originality Synthesis-oriented
AI Analysis

This addresses safety risks in conversational AI for medical queries, which is critical for user health, but it is incremental as it builds on existing annotation and classification methods.

The study tackled the problem of conversational AI systems engaging in unsafe behavior with medical queries by creating a corpus of queries and responses labeled with crowdsourced and expert annotations, finding that aggregated crowd labels align with professional opinions on identifying medical queries and risk types, but automated classification requires caution due to potential serious errors.

Conversational AI systems can engage in unsafe behaviour when handling users' medical queries that can have severe consequences and could even lead to deaths. Systems therefore need to be capable of both recognising the seriousness of medical inputs and producing responses with appropriate levels of risk. We create a corpus of human written English language medical queries and the responses of different types of systems. We label these with both crowdsourced and expert annotations. While individual crowdworkers may be unreliable at grading the seriousness of the prompts, their aggregated labels tend to agree with professional opinion to a greater extent on identifying the medical queries and recognising the risk types posed by the responses. Results of classification experiments suggest that, while these tasks can be automated, caution should be exercised, as errors can potentially be very serious.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes