Crowdsourcing-Based Knowledge Graph Construction for Drug Side Effects Using Large Language Models with an Application on Semaglutide
This work addresses pharmacovigilance for healthcare professionals and patients by offering a method to analyze real-world side effects from social media, though it is incremental as it applies existing LLM techniques to a new domain.
The authors tackled the challenge of extracting medication side effects from noisy social media data by developing a framework using large language models to construct a knowledge graph, applied to semaglutide for weight loss with validation against the FAERS database, providing patient-centered insights.
Social media is a rich source of real-world data that captures valuable patient experience information for pharmacovigilance. However, mining data from unstructured and noisy social media content remains a challenging task. We present a systematic framework that leverages large language models (LLMs) to extract medication side effects from social media and organize them into a knowledge graph (KG). We apply this framework to semaglutide for weight loss using data from Reddit. Using the constructed knowledge graph, we perform comprehensive analyses to investigate reported side effects across different semaglutide brands over time. These findings are further validated through comparison with adverse events reported in the FAERS database, providing important patient-centered insights into semaglutide's side effects that complement its safety profile and current knowledge base of semaglutide for both healthcare professionals and patients. Our work demonstrates the feasibility of using LLMs to transform social media data into structured KGs for pharmacovigilance.