LGApr 23, 2024

Integrating Heterogeneous Gene Expression Data through Knowledge Graphs for Improving Diabetes Prediction

arXiv:2404.14970v11 citationsh-index: 7Has CodeSeWeBMeDA@ESWC
Originality Incremental advance
AI Analysis

This work addresses a domain-specific challenge in biomedical data integration for diabetes prediction, offering an incremental improvement over existing methods.

The authors tackled the problem of limited sample sizes and incompatibility in gene expression datasets for diabetes prediction by integrating multiple datasets and domain-specific knowledge using knowledge graphs, resulting in improved prediction performance.

Diabetes is a worldwide health issue affecting millions of people. Machine learning methods have shown promising results in improving diabetes prediction, particularly through the analysis of diverse data types, namely gene expression data. While gene expression data can provide valuable insights, challenges arise from the fact that the sample sizes in expression datasets are usually limited, and the data from different datasets with different gene expressions cannot be easily combined. This work proposes a novel approach to address these challenges by integrating multiple gene expression datasets and domain-specific knowledge using knowledge graphs, a unique tool for biomedical data integration. KG embedding methods are then employed to generate vector representations, serving as inputs for a classifier. Experiments demonstrated the efficacy of our approach, revealing improvements in diabetes prediction when integrating multiple gene expression datasets and domain-specific knowledge about protein functions and interactions.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes