GeSubNet: Gene Interaction Inference for Disease Subtype Network Generation
This work addresses a domain-specific challenge in bioinformatics for researchers studying disease subtypes, offering incremental improvements by integrating existing methods like deep generative models and graph neural networks.
The paper tackles the problem of generating disease subtype-specific gene interaction networks by addressing the mismatch between general knowledge databases and subtype variations, proposing GeSubNet which improves graph evaluation metrics by 30.6% to 56.6% on average across cancer datasets and identifies subtype-specific genes with an 83% likelihood of impacting patient distribution shifts.
Retrieving gene functional networks from knowledge databases presents a challenge due to the mismatch between disease networks and subtype-specific variations. Current solutions, including statistical and deep learning methods, often fail to effectively integrate gene interaction knowledge from databases or explicitly learn subtype-specific interactions. To address this mismatch, we propose GeSubNet, which learns a unified representation capable of predicting gene interactions while distinguishing between different disease subtypes. Graphs generated by such representations can be considered subtype-specific networks. GeSubNet is a multi-step representation learning framework with three modules: First, a deep generative model learns distinct disease subtypes from patient gene expression profiles. Second, a graph neural network captures representations of prior gene networks from knowledge databases, ensuring accurate physical gene interactions. Finally, we integrate these two representations using an inference loss that leverages graph generation capabilities, conditioned on the patient separation loss, to refine subtype-specific information in the learned representation. GeSubNet consistently outperforms traditional methods, with average improvements of 30.6%, 21.0%, 20.1%, and 56.6% across four graph evaluation metrics, averaged over four cancer datasets. Particularly, we conduct a biological simulation experiment to assess how the behavior of selected genes from over 11,000 candidates affects subtypes or patient distributions. The results show that the generated network has the potential to identify subtype-specific genes with an 83% likelihood of impacting patient distribution shifts.