CLAILGDec 16, 2024

Structured Extraction of Real World Medical Knowledge using LLMs for Summarization and Search

arXiv:2412.15256v15 citationsh-index: 3BigData
Originality Incremental advance
AI Analysis

This work addresses the challenge of mapping complex medical data to rigid ontologies for clinicians and researchers, though it is incremental by building on existing LLM and ontology methods.

The authors tackled the problem of extracting nuanced patient conditions from electronic health records by using large language models to create patient-specific knowledge graphs, enabling symptom-based searches that identified rare disease patients with high precision (e.g., 0.85 for Dravet syndrome).

Creation and curation of knowledge graphs can accelerate disease discovery and analysis in real-world data. While disease ontologies aid in biological data annotation, codified categories (SNOMED-CT, ICD10, CPT) may not capture patient condition nuances or rare diseases. Multiple disease definitions across data sources complicate ontology mapping and disease clustering. We propose creating patient knowledge graphs using large language model extraction techniques, allowing data extraction via natural language rather than rigid ontological hierarchies. Our method maps to existing ontologies (MeSH, SNOMED-CT, RxNORM, HPO) to ground extracted entities. Using a large ambulatory care EHR database with 33.6M patients, we demonstrate our method through the patient search for Dravet syndrome, which received ICD10 recognition in October 2020. We describe our construction of patient-specific knowledge graphs and symptom-based patient searches. Using confirmed Dravet syndrome ICD10 codes as ground truth, we employ LLM-based entity extraction to characterize patients in grounded ontologies. We then apply this method to identify Beta-propeller protein-associated neurodegeneration (BPAN) patients, demonstrating real-world discovery where no ground truth exists.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes