Using ontology embeddings for structural inductive bias in gene expression data analysis
This work is significant for cancer researchers and clinicians, as it offers an improved method for patient stratification using gene expression data, which can lead to better diagnosis and treatment planning.
This paper addresses the challenge of high-dimensional, low-sample gene expression data for cancer patient stratification. By integrating prior biological knowledge from ontologies via ontology embeddings to guide a Graph Convolutional Network, the authors demonstrate an advantage in predicting clinical targets.
Stratifying cancer patients based on their gene expression levels allows improving diagnosis, survival analysis and treatment planning. However, such data is extremely highly dimensional as it contains expression values for over 20000 genes per patient, and the number of samples in the datasets is low. To deal with such settings, we propose to incorporate prior biological knowledge about genes from ontologies into the machine learning system for the task of patient classification given their gene expression data. We use ontology embeddings that capture the semantic similarities between the genes to direct a Graph Convolutional Network, and therefore sparsify the network connections. We show this approach provides an advantage for predicting clinical targets from high-dimensional low-sample data.