AISep 30, 2025

MEDAKA: Construction of Biomedical Knowledge Graphs Using Large Language Models

arXiv:2509.26128v1h-index: 2Has Code
Originality Synthesis-oriented
AI Analysis

This work addresses the need for comprehensive biomedical knowledge graphs for tasks like patient safety monitoring and drug recommendation, but it is incremental as it applies existing LLM methods to a new data source (drug leaflets).

The authors tackled the problem of constructing biomedical knowledge graphs from unstructured drug leaflets, presenting an end-to-end pipeline using a web scraper and an LLM to create the MEDAKA dataset, which captures clinically relevant attributes like side effects and dosage guidelines, and evaluated it through manual inspection and LLM-as-a-Judge, showing coverage comparisons with existing resources.

Knowledge graphs (KGs) are increasingly used to represent biomedical information in structured, interpretable formats. However, existing biomedical KGs often focus narrowly on molecular interactions or adverse events, overlooking the rich data found in drug leaflets. In this work, we present (1) a hackable, end-to-end pipeline to create KGs from unstructured online content using a web scraper and an LLM; and (2) a curated dataset, MEDAKA, generated by applying this method to publicly available drug leaflets. The dataset captures clinically relevant attributes such as side effects, warnings, contraindications, ingredients, dosage guidelines, storage instructions and physical characteristics. We evaluate it through manual inspection and with an LLM-as-a-Judge framework, and compare its coverage with existing biomedical KGs and databases. We expect MEDAKA to support tasks such as patient safety monitoring and drug recommendation. The pipeline can also be used for constructing KGs from unstructured texts in other domains. Code and dataset are available at https://github.com/medakakg/medaka.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes