SNOMED CT-powered Knowledge Graphs for Structured Clinical Data and Diagnostic Reasoning
This work addresses the challenge of noisy and inconsistent training data for AI in healthcare, offering a scalable solution for reliable AI-assisted clinical systems, though it is incremental as it builds on existing terminology and graph database methods.
The paper tackled the problem of unstructured clinical documentation hindering AI in healthcare by developing a knowledge-driven framework using SNOMED CT and Neo4j to construct structured medical knowledge graphs, which improved the clinical logic consistency of LLM outputs through fine-tuning.
The effectiveness of artificial intelligence (AI) in healthcare is significantly hindered by unstructured clinical documentation, which results in noisy, inconsistent, and logically fragmented training data. To address this challenge, we present a knowledge-driven framework that integrates the standardized clinical terminology SNOMED CT with the Neo4j graph database to construct a structured medical knowledge graph. In this graph, clinical entities such as diseases, symptoms, and medications are represented as nodes, and semantic relationships such as ``caused by,'' ``treats,'' and ``belongs to'' are modeled as edges in Neo4j, with types mapped from formal SNOMED CT relationship concepts (e.g., \texttt{Causative agent}, \texttt{Indicated for}). This design enables multi-hop reasoning and ensures terminological consistency. By extracting and standardizing entity-relationship pairs from clinical texts, we generate structured, JSON-formatted datasets that embed explicit diagnostic pathways. These datasets are used to fine-tune large language models (LLMs), significantly improving the clinical logic consistency of their outputs. Experimental results demonstrate that our knowledge-guided approach enhances the validity and interpretability of AI-generated diagnostic reasoning, providing a scalable solution for building reliable AI-assisted clinical systems.