CLLGSep 28, 2024

HealthQ: Unveiling Questioning Capabilities of LLM Chains in Healthcare Conversations

arXiv:2409.19487v445 citationsh-index: 5
Originality Incremental advance
AI Analysis

This addresses the need for systematic evaluation of questioning abilities in digital healthcare, though it is incremental as it builds on existing LLM techniques like RAG and CoT.

The paper tackles the problem of evaluating how well large language model (LLM) chains can ask questions to gather patient information in healthcare conversations, introducing HealthQ as a framework that assesses these capabilities and shows robust validation across multiple LLM judges and datasets.

Effective patient care in digital healthcare requires large language models (LLMs) that not only answer questions but also actively gather critical information through well-crafted inquiries. This paper introduces HealthQ, a novel framework for evaluating the questioning capabilities of LLM healthcare chains. By implementing advanced LLM chains, including Retrieval-Augmented Generation (RAG), Chain of Thought (CoT), and reflective chains, HealthQ assesses how effectively these chains elicit comprehensive and relevant patient information. To achieve this, we integrate an LLM judge to evaluate generated questions across metrics such as specificity, relevance, and usefulness, while aligning these evaluations with traditional Natural Language Processing (NLP) metrics like ROUGE and Named Entity Recognition (NER)-based set comparisons. We validate HealthQ using two custom datasets constructed from public medical datasets, ChatDoctor and MTS-Dialog, and demonstrate its robustness across multiple LLM judge models, including GPT-3.5, GPT-4, and Claude. Our contributions are threefold: we present the first systematic framework for assessing questioning capabilities in healthcare conversations, establish a model-agnostic evaluation methodology, and provide empirical evidence linking high-quality questions to improved patient information elicitation.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes