LGGNApr 2, 2022

Cancer Subtyping via Embedded Unsupervised Learning on Transcriptomics Data

arXiv:2204.02278v12 citationsh-index: 50
Originality Incremental advance
AI Analysis

This work addresses cancer subtyping for clinical treatment, but it appears incremental as it builds on existing unsupervised methods with a specific modification.

The paper tackles the problem of overfitting in automatic cancer subtyping systems due to high dimensionality and data scarcity by proposing an unsupervised learning approach that constructs the underlying data distribution to generate sufficient data, achieving improved results as demonstrated in experiments.

Cancer is one of the deadliest diseases worldwide. Accurate diagnosis and classification of cancer subtypes are indispensable for effective clinical treatment. Promising results on automatic cancer subtyping systems have been published recently with the emergence of various deep learning methods. However, such automatic systems often overfit the data due to the high dimensionality and scarcity. In this paper, we propose to investigate automatic subtyping from an unsupervised learning perspective by directly constructing the underlying data distribution itself, hence sufficient data can be generated to alleviate the issue of overfitting. Specifically, we bypass the strong Gaussianity assumption that typically exists but fails in the unsupervised learning subtyping literature due to small-sized samples by vector quantization. Our proposed method better captures the latent space features and models the cancer subtype manifestation on a molecular basis, as demonstrated by the extensive experimental results.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes