CLJul 24, 2024
SDoH-GPT: Using Large Language Models to Extract Social Determinants of Health (SDoH)Bernardo Consoli, Xizhi Wu, Song Wang et al.
Extracting social determinants of health (SDoH) from unstructured medical notes depends heavily on labor-intensive annotations, which are typically task-specific, hampering reusability and limiting sharing. In this study we introduced SDoH-GPT, a simple and effective few-shot Large Language Model (LLM) method leveraging contrastive examples and concise instructions to extract SDoH without relying on extensive medical annotations or costly human intervention. It achieved tenfold and twentyfold reductions in time and cost respectively, and superior consistency with human annotators measured by Cohen's kappa of up to 0.92. The innovative combination of SDoH-GPT and XGBoost leverages the strengths of both, ensuring high accuracy and computational efficiency while consistently maintaining 0.90+ AUROC scores. Testing across three distinct datasets has confirmed its robustness and accuracy. This study highlights the potential of leveraging LLMs to revolutionize medical note classification, demonstrating their capability to achieve highly accurate classifications with significantly reduced time and cost.
AINov 29, 2024Code
Integrating Social Determinants of Health into Knowledge Graphs: Evaluating Prediction Bias and Fairness in HealthcareTianqi Shang, Weiqing He, Tianlong Chen et al.
Social determinants of health (SDoH) play a crucial role in patient health outcomes, yet their integration into biomedical knowledge graphs remains underexplored. This study addresses this gap by constructing an SDoH-enriched knowledge graph using the MIMIC-III dataset and PrimeKG. We introduce a novel fairness formulation for graph embeddings, focusing on invariance with respect to sensitive SDoH information. Via employing a heterogeneous-GCN model for drug-disease link prediction, we detect biases related to various SDoH factors. To mitigate these biases, we propose a post-processing method that strategically reweights edges connected to SDoHs, balancing their influence on graph representations. This approach represents one of the first comprehensive investigations into fairness issues within biomedical knowledge graphs incorporating SDoH. Our work not only highlights the importance of considering SDoH in medical informatics but also provides a concrete method for reducing SDoH-related biases in link prediction tasks, paving the way for more equitable healthcare recommendations. Our code is available at \url{https://github.com/hwq0726/SDoH-KG}.
43.9AIApr 13
DreamKG: A KG-Augmented Conversational System for People Experiencing HomelessnessJavad M Alizadeh, Genhui Zheng, Chiu C Tan et al.
People experiencing homelessness (PEH) face substantial barriers to accessing timely, accurate information about community services. DreamKG addresses this through a knowledge graph-augmented conversational system that grounds responses in verified, up-to-date data about Philadelphia organizations, services, locations, and hours. Unlike standard large language models (LLMs) prone to hallucinations, DreamKG combines Neo4j knowledge graphs with structured query understanding to handle location-aware and time-sensitive queries reliably. The system performs spatial reasoning for distance-based recommendations and temporal filtering for operating hours. Preliminary evaluation shows 59% superiority over Google Search AI on relevant queries and 84% rejection of irrelevant queries. This demonstration highlights the potential of hybrid architectures that combines LLM flexibility with knowledge graph reliability to improve service accessibility for vulnerable populations effectively.
QMDec 12, 2024
Predicting Emergency Department Visits for Patients with Type II DiabetesJavad M Alizadeh, Jay S Patel, Gabriel Tajeu et al.
Over 30 million Americans are affected by Type II diabetes (T2D), a treatable condition with significant health risks. This study aims to develop and validate predictive models using machine learning (ML) techniques to estimate emergency department (ED) visits among patients with T2D. Data for these patients was obtained from the HealthShare Exchange (HSX), focusing on demographic details, diagnoses, and vital signs. Our sample contained 34,151 patients diagnosed with T2D which resulted in 703,065 visits overall between 2017 and 2021. A workflow integrated EMR data with SDoH for ML predictions. A total of 87 out of 2,555 features were selected for model construction. Various machine learning algorithms, including CatBoost, Ensemble Learning, K-nearest Neighbors (KNN), Support Vector Classification (SVC), Random Forest, and Extreme Gradient Boosting (XGBoost), were employed with tenfold cross-validation to predict whether a patient is at risk of an ED visit. The ROC curves for Random Forest, XGBoost, Ensemble Learning, CatBoost, KNN, and SVC, were 0.82, 0.82, 0.82, 0.81, 0.72, 0.68, respectively. Ensemble Learning and Random Forest models demonstrated superior predictive performance in terms of discrimination, calibration, and clinical applicability. These models are reliable tools for predicting risk of ED visits among patients with T2D. They can estimate future ED demand and assist clinicians in identifying critical factors associated with ED utilization, enabling early interventions to reduce such visits. The top five important features were age, the difference between visitation gaps, visitation gaps, R10 or abdominal and pelvic pain, and the Index of Concentration at the Extremes (ICE) for income.
CRFeb 23, 2021
Usability and Security of Different Authentication Methods for an Electronic Health Records SystemSaptarshi Purkayastha, Shreya Goyal, Bolu Oluwalade et al.
We conducted a survey of 67 graduate students enrolled in the Privacy and Security in Healthcare course at Indiana University Purdue University Indianapolis. This was done to measure user preference and their understanding of usability and security of three different Electronic Health Records authentication methods: single authentication method (username and password), Single sign-on with Central Authentication Service (CAS) authentication method, and a bio-capsule facial authentication method. This research aims to explore the relationship between security and usability, and measure the effect of perceived security on usability in these three aforementioned authentication methods. We developed a formative-formative Partial Least Square Structural Equation Modeling (PLS-SEM) model to measure the relationship between the latent variables of Usability, and Security. The measurement model was developed using five observed variables (measures). - Efficiency and Effectiveness, Satisfaction, Preference, Concerns, and Confidence. The results obtained highlight the importance and impact of these measures on the latent variables and the relationship among the latent variables. From the PLS-SEM analysis, it was found that security has a positive impact on usability for Single sign-on and bio-capsule facial authentication methods. We conclude that the facial authentication method was the most secure and usable among the three authentication methods. Further, descriptive analysis was done to draw out the interesting findings from the survey regarding the observed variables.