Topic Modeling on Clinical Social Work Notes for Exploring Social Determinants of Health Factors
This work addresses the need for better SDoH data extraction in healthcare, particularly for social workers and researchers, but it is incremental as it applies existing methods to a new data source.
The researchers tackled the problem of identifying social determinants of health (SDoH) by analyzing clinical social work notes, which are often overlooked, and found that topic modeling on 0.95 million notes extracted 11 robust topics related to factors like financial status and abuse history, demonstrating rich and unique SDoH information.
Most research studying social determinants of health (SDoH) has focused on physician notes or structured elements of the electronic medical record (EMR). We hypothesize that clinical notes from social workers, whose role is to ameliorate social and economic factors, might provide a richer source of data on SDoH. We sought to perform topic modeling to identify robust topics of discussion within a large cohort of social work notes. We retrieved a diverse, deidentified corpus of 0.95 million clinical social work notes from 181,644 patients at the University of California, San Francisco. We used word frequency analysis and Latent Dirichlet Allocation (LDA) topic modeling analysis to characterize this corpus and identify potential topics of discussion. Word frequency analysis identified both medical and non-medical terms associated with specific ICD10 chapters. The LDA topic modeling analysis extracted 11 topics related to social determinants of health risk factors including financial status, abuse history, social support, risk of death, and mental health. In addition, the topic modeling approach captured the variation between different types of social work notes and across patients with different types of diseases or conditions. We demonstrated that social work notes contain rich, unique, and otherwise unobtainable information on an individual's SDoH.