LG MLSep 2, 2020

Select-ProtoNet: Learning to Select for Few-Shot Disease Subtype Prediction

Ziyi Yang, Jun Shu, Yong Liang, Deyu Meng, Zongben Xu

arXiv:2009.00792v22.32 citations

Originality Incremental advance

AI Analysis

This work addresses the challenge of predicting disease subtypes with limited genomic data, which is crucial for personalized medicine, though it is incremental as it builds on existing meta-learning methods.

The authors tackled few-shot disease subtype prediction from genomic data by extending Prototypical Networks with feature selection and sample reweighting modules to handle high dimensionality and noise, achieving superior performance in simulations and real gene expression data experiments.

Current machine learning has made great progress on computer vision and many other fields attributed to the large amount of high-quality training samples, while it does not work very well on genomic data analysis, since they are notoriously known as small data. In our work, we focus on few-shot disease subtype prediction problem, identifying subgroups of similar patients that can guide treatment decisions for a specific individual through training on small data. In fact, doctors and clinicians always address this problem by studying several interrelated clinical variables simultaneously. We attempt to simulate such clinical perspective, and introduce meta learning techniques to develop a new model, which can extract the common experience or knowledge from interrelated clinical tasks and transfer it to help address new tasks. Our new model is built upon a carefully designed meta-learner, called Prototypical Network, that is a simple yet effective meta learning machine for few-shot image classification. Observing that gene expression data have specifically high dimensionality and high noise properties compared with image data, we proposed a new extension of it by appending two modules to address these issues. Concretely, we append a feature selection layer to automatically filter out the disease-irrelated genes and incorporate a sample reweighting strategy to adaptively remove noisy data, and meanwhile the extended model is capable of learning from a limited number of training examples and generalize well. Simulations and real gene expression data experiments substantiate the superiority of the proposed method for predicting the subtypes of disease and identifying potential disease-related genes.

View on arXiv PDF

Similar