Lauren A. Lepow

CL
h-index32
3papers
22citations
Novelty17%
AI Score28

3 Papers

CLSep 29, 2023
Clinical Text Deduplication Practices for Efficient Pretraining and Improved Clinical Tasks

Isotta Landi, Eugenia Alleva, Alissa A. Valentine et al.

Despite being a unique source of information on patients' status and disease progression, clinical notes are characterized by high levels of duplication and information redundancy. In general domain text, it has been shown that deduplication does not harm language model (LM) pretraining, thus helping reduce the training cost. Although large LMs have proven to learn medical knowledge, they still require specialized domain adaptation for improved downstream clinical tasks. By leveraging large real-world clinical corpora, we first provided a fine-grained characterization of duplicates stemming from common writing practices and clinical relevancy. Second, we demonstrated that deduplicating clinical text can help clinical LMs encode less redundant information in a more efficient manner and do not harm classification tasks via prompt-based learning.

CLMar 25, 2024Code
Extracting Social Support and Social Isolation Information from Clinical Psychiatry Notes: Comparing a Rule-based NLP System and a Large Language Model

Braja Gopal Patra, Lauren A. Lepow, Praneet Kasi Reddy Jagadeesh Kumar et al.

Background: Social support (SS) and social isolation (SI) are social determinants of health (SDOH) associated with psychiatric outcomes. In electronic health records (EHRs), individual-level SS/SI is typically documented as narrative clinical notes rather than structured coded data. Natural language processing (NLP) algorithms can automate the otherwise labor-intensive process of data extraction. Data and Methods: Psychiatric encounter notes from Mount Sinai Health System (MSHS, n=300) and Weill Cornell Medicine (WCM, n=225) were annotated and established a gold standard corpus. A rule-based system (RBS) involving lexicons and a large language model (LLM) using FLAN-T5-XL were developed to identify mentions of SS and SI and their subcategories (e.g., social network, instrumental support, and loneliness). Results: For extracting SS/SI, the RBS obtained higher macro-averaged f-scores than the LLM at both MSHS (0.89 vs. 0.65) and WCM (0.85 vs. 0.82). For extracting subcategories, the RBS also outperformed the LLM at both MSHS (0.90 vs. 0.62) and WCM (0.82 vs. 0.81). Discussion and Conclusion: Unexpectedly, the RBS outperformed the LLMs across all metrics. Intensive review demonstrates that this finding is due to the divergent approach taken by the RBS and LLM. The RBS were designed and refined to follow the same specific rules as the gold standard annotations. Conversely, the LLM were more inclusive with categorization and conformed to common English-language understanding. Both approaches offer advantages and are made available open-source for future testing.

OTSep 2, 2025
Quantifying Clinician Bias and its Effects on Schizophrenia Diagnosis in the Emergency Department of the Mount Sinai Health System

Alissa A. Valentine, Lauren A. Lepow, Lili Chan et al.

In the United States, schizophrenia (SCZ) carries a race and sex disparity that may be explained by clinician bias - a belief held by a clinician about a patient that prevents impartial clinical decision making. The emergency department (ED) is marked by higher rates of stress that lead to clinicians relying more on implicit biases during decision making. In this work, we considered a large cohort of psychiatric patients in the ED from the Mount Sinai Health System (MSHS) in New York City to investigate the effects of clinician bias on SCZ diagnosis while controlling for known risk factors and patient sociodemographic information. Clinician bias was quantified as the ratio of negative to total sentences within a patient's first ED note. We utilized a logistic regression to predict SCZ diagnosis given patient race, sex, age, history of trauma or substance use disorder, and the ratio of negative sentences. Our findings showed that an increased ratio of negative sentences is associated with higher odds of obtaining a SCZ diagnosis [OR (95% CI)=1.408 (1.361-1.456)]. Identifying as male [OR (95% CI)=1.112 (1.055-1.173)] or Black [OR (95% CI)=1.081(1.031-1.133)] increased one's odds of being diagnosed with SCZ. However, from an intersectional lens, Black female patients with high SES have the highest odds of obtaining a SCZ diagnosis [OR (95% CI)=1.629 (1.535-1.729)]. Results such as these suggest that SES does not act as a protective buffer against SCZ diagnosis in all patients, demanding more attention to the quantification of health disparities. Lastly, we demonstrated that clinician bias is operational with real world data and related to increased odds of obtaining a stigmatizing diagnosis such as SCZ.