LG GNApr 2, 2022

Cancer Subtyping via Embedded Unsupervised Learning on Transcriptomics Data

Ziwei Yang, Lingwei Zhu, Zheng Chen, Ming Huang, Naoaki Ono, MD Altaf-Ul-Amin, Shigehiko Kanaya

arXiv:2204.02278v11.82 citationsh-index: 50

Originality Incremental advance

AI Analysis

This work addresses cancer subtyping for clinical treatment, but it appears incremental as it builds on existing unsupervised methods with a specific modification.

The paper tackles the problem of overfitting in automatic cancer subtyping systems due to high dimensionality and data scarcity by proposing an unsupervised learning approach that constructs the underlying data distribution to generate sufficient data, achieving improved results as demonstrated in experiments.

Cancer is one of the deadliest diseases worldwide. Accurate diagnosis and classification of cancer subtypes are indispensable for effective clinical treatment. Promising results on automatic cancer subtyping systems have been published recently with the emergence of various deep learning methods. However, such automatic systems often overfit the data due to the high dimensionality and scarcity. In this paper, we propose to investigate automatic subtyping from an unsupervised learning perspective by directly constructing the underlying data distribution itself, hence sufficient data can be generated to alleviate the issue of overfitting. Specifically, we bypass the strong Gaussianity assumption that typically exists but fails in the unsupervised learning subtyping literature due to small-sized samples by vector quantization. Our proposed method better captures the latent space features and models the cancer subtype manifestation on a molecular basis, as demonstrated by the extensive experimental results.

View on arXiv PDF

Similar