CLAIMay 31, 2021

DiaKG: an Annotated Diabetes Dataset for Medical Knowledge Graph Construction

arXiv:2105.15033v227 citations
Originality Synthesis-oriented
AI Analysis

This provides a domain-specific dataset to facilitate AI applications in diabetes, but it is incremental as it focuses on a new dataset rather than novel methods.

The authors tackled the lack of high-quality annotated corpora for medical knowledge graphs by introducing DiaKG, a Chinese dataset for diabetes containing 22,050 entities and 6,890 relations, which they found challenging for existing methods in benchmarks.

Knowledge Graph has been proven effective in modeling structured information and conceptual knowledge, especially in the medical domain. However, the lack of high-quality annotated corpora remains a crucial problem for advancing the research and applications on this task. In order to accelerate the research for domain-specific knowledge graphs in the medical domain, we introduce DiaKG, a high-quality Chinese dataset for Diabetes knowledge graph, which contains 22,050 entities and 6,890 relations in total. We implement recent typical methods for Named Entity Recognition and Relation Extraction as a benchmark to evaluate the proposed dataset thoroughly. Empirical results show that the DiaKG is challenging for most existing methods and further analysis is conducted to discuss future research direction for improvements. We hope the release of this dataset can assist the construction of diabetes knowledge graphs and facilitate AI-based applications.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes