Stephen Liu

3papers

83citations

Novelty48%

AI Score38

Ranked #107,165 of 205,806 authors (top 52%)#6,961 in AI (top 49%)

3 Papers

LGApr 15, 2022

Detection of sepsis during emergency department triage using machine learning

Oleksandr Ivanov, Karin Molander, Robert Dunne et al.

Sepsis is a life-threatening condition with organ dysfunction and is a leading cause of death and critical illness worldwide. Even a few hours of delay in the treatment of sepsis results in increased mortality. Early detection of sepsis during emergency department triage would allow early initiation of lab analysis, antibiotic administration, and other sepsis treatment protocols. The purpose of this study was to compare sepsis detection performance at ED triage (prior to the use of laboratory diagnostics) of the standard sepsis screening algorithm (SIRS with source of infection) and a machine learning algorithm trained on EHR triage data. A machine learning model (KATE Sepsis) was developed using patient encounters with triage data from 16participating hospitals. KATE Sepsis and standard screening were retrospectively evaluated on the adult population of 512,949 medical records. KATE Sepsis demonstrates an AUC of 0.9423 (0.9401 - 0.9441) with sensitivity of 71.09% (70.12% - 71.98%) and specificity of 94.81% (94.75% - 94.87%). Standard screening demonstrates an AUC of 0.6826 (0.6774 - 0.6878) with sensitivity of 40.8% (39.71% - 41.86%) and specificity of 95.72% (95.68% - 95.78%). The KATE Sepsis model trained to detect sepsis demonstrates 77.67% (75.78% -79.42%) sensitivity in detecting severe sepsis and 86.95% (84.2% - 88.81%) sensitivity in detecting septic shock. The standard screening protocol demonstrates 43.06% (41% - 45.87%) sensitivity in detecting severe sepsis and40% (36.55% - 43.26%) sensitivity in detecting septic shock. Future research should focus on the prospective impact of KATE Sepsis on administration of antibiotics, readmission rate, morbidity and mortality.

AIFeb 18

SourceBench: Can AI Answers Reference Quality Web Sources?

Hexi Jin, Stephen Liu, Yuheng Li et al.

Large language models (LLMs) increasingly answer queries by citing web sources, but existing evaluations emphasize answer correctness rather than evidence quality. We introduce SourceBench, a benchmark for measuring the quality of cited web sources across 100 real-world queries spanning informational, factual, argumentative, social, and shopping intents. SourceBench uses an eight-metric framework covering content quality (content relevance, factual accuracy, objectivity) and page-level signals (e.g., freshness, authority/accountability, clarity), and includes a human-labeled dataset with a calibrated LLM-based evaluator that matches expert judgments closely. We evaluate eight LLMs, Google Search, and three AI search tools over 3996 cited sources using SourceBench and conduct further experiments to understand the evaluation results. Overall, our work reveals four key new insights that can guide future research in the direction of GenAI and web search.

CYMar 29, 2020

Improving Emergency Department ESI Acuity Assignment Using Machine Learning and Clinical Natural Language Processing

Oleksandr Ivanov, Lisa Wolf, Deena Brecher et al.

Effective triage is critical to mitigating the effect of increased volume by accurately determining patient acuity, need for resources, and establishing effective acuity-based patient prioritization. The purpose of this retrospective study was to determine whether historical EHR data can be extracted and synthesized with clinical natural language processing (C-NLP) and the latest ML algorithms (KATE) to produce highly accurate ESI predictive models. An ML model (KATE) for the triage process was developed using 166,175 patient encounters from two participating hospitals. The model was then tested against a gold set that was derived from a random sample of triage encounters at the study sites and correct acuity assignments were recorded by study clinicians using the Emergency Severity Index (ESI) standard as a guide. At the two study sites, KATE predicted accurate ESI acuity assignments 75.9% of the time, compared to nurses (59.8%) and average individual study clinicians (75.3%). KATE accuracy was 26.9% higher than the average nurse accuracy (p-value < 0.0001). On the boundary between ESI 2 and ESI 3 acuity assignments, which relates to the risk of decompensation, KATE was 93.2% higher with 80% accuracy, compared to triage nurses with 41.4% accuracy (p-value < 0.0001). KATE provides a triage acuity assignment substantially more accurate than the triage nurses in this study sample. KATE operates independently of contextual factors, unaffected by the external pressures that can cause under triage and may mitigate the racial and social biases that can negatively affect the accuracy of triage assignment. Future research should focus on the impact of KATE providing feedback to triage nurses in real time, KATEs impact on mortality and morbidity, ED throughput, resource optimization, and nursing outcomes.