CLJul 11, 2025

Application of CARE-SD text classifier tools to assess distribution of stigmatizing and doubt-marking language features in EHR

Drew Walker, Jennifer Love, Swati Rajwal, Isabel C Walker, Hannah LF Cooper, Abeed Sarker, Melvin Livingston

arXiv:2507.08969v1h-index: 7

Originality Synthesis-oriented

AI Analysis

This addresses the issue of perpetuated patient stigmatization in healthcare settings, but it is incremental as it applies existing tools to new data.

The study tackled the problem of stigmatizing and doubt-marking language in electronic health records (EHR) by applying text classifier tools to MIMIC-III data, finding higher rates of such language among Black or African American patients (RR: 1.16), those with government insurance (RR: 2.46), and specific provider types like social workers (RR: 2.25).

Introduction: Electronic health records (EHR) are a critical medium through which patient stigmatization is perpetuated among healthcare teams. Methods: We identified linguistic features of doubt markers and stigmatizing labels in MIMIC-III EHR via expanded lexicon matching and supervised learning classifiers. Predictors of rates of linguistic features were assessed using Poisson regression models. Results: We found higher rates of stigmatizing labels per chart among patients who were Black or African American (RR: 1.16), patients with Medicare/Medicaid or government-run insurance (RR: 2.46), self-pay (RR: 2.12), and patients with a variety of stigmatizing disease and mental health conditions. Patterns among doubt markers were similar, though male patients had higher rates of doubt markers (RR: 1.25). We found increased stigmatizing labels used by nurses (RR: 1.40), and social workers (RR: 2.25), with similar patterns of doubt markers. Discussion: Stigmatizing language occurred at higher rates among historically stigmatized patients, perpetuated by multiple provider types.

View on arXiv PDF

Similar