Conversational Assistants to support Heart Failure Patients: comparing a Neurosymbolic Architecture with ChatGPT
This work addresses the need for controlled evaluations of conversational assistants in healthcare for heart failure patients, though it is incremental in comparing existing architectures.
The study compared a neurosymbolic conversational assistant with ChatGPT for heart failure patients to ask about salt content in food, finding that the neurosymbolic system was more accurate and less verbose, while ChatGPT made fewer speech errors and required fewer clarifications, with no patient preference.
Conversational assistants are becoming more and more popular, including in healthcare, partly because of the availability and capabilities of Large Language Models. There is a need for controlled, probing evaluations with real stakeholders which can highlight advantages and disadvantages of more traditional architectures and those based on generative AI. We present a within-group user study to compare two versions of a conversational assistant that allows heart failure patients to ask about salt content in food. One version of the system was developed in-house with a neurosymbolic architecture, and one is based on ChatGPT. The evaluation shows that the in-house system is more accurate, completes more tasks and is less verbose than the one based on ChatGPT; on the other hand, the one based on ChatGPT makes fewer speech errors and requires fewer clarifications to complete the task. Patients show no preference for one over the other.