CL AIJul 24, 2024

SDoH-GPT: Using Large Language Models to Extract Social Determinants of Health (SDoH)

Bernardo Consoli, Xizhi Wu, Song Wang, Xinyu Zhao, Yanshan Wang, Justin Rousseau, Tom Hartvigsen, Li Shen, Huanmei Wu, Yifan Peng, Qi Long, Tianlong Chen

arXiv:2407.17126v14 citationsh-index: 9

Originality Incremental advance

AI Analysis

This addresses the labor-intensive annotation bottleneck in medical informatics, though it appears incremental as it combines existing LLM techniques with XGBoost.

The researchers tackled the problem of extracting social determinants of health from unstructured medical notes by developing SDoH-GPT, a few-shot LLM method that achieved tenfold and twentyfold reductions in time and cost respectively while maintaining Cohen's kappa up to 0.92 and AUROC scores above 0.90.

Extracting social determinants of health (SDoH) from unstructured medical notes depends heavily on labor-intensive annotations, which are typically task-specific, hampering reusability and limiting sharing. In this study we introduced SDoH-GPT, a simple and effective few-shot Large Language Model (LLM) method leveraging contrastive examples and concise instructions to extract SDoH without relying on extensive medical annotations or costly human intervention. It achieved tenfold and twentyfold reductions in time and cost respectively, and superior consistency with human annotators measured by Cohen's kappa of up to 0.92. The innovative combination of SDoH-GPT and XGBoost leverages the strengths of both, ensuring high accuracy and computational efficiency while consistently maintaining 0.90+ AUROC scores. Testing across three distinct datasets has confirmed its robustness and accuracy. This study highlights the potential of leveraging LLMs to revolutionize medical note classification, demonstrating their capability to achieve highly accurate classifications with significantly reduced time and cost.

View on arXiv PDF

Similar