GNLGQMSep 9, 2024

Hierarchical novel class discovery for single-cell transcriptomic profiles

arXiv:2409.05937v1h-index: 3
Originality Incremental advance
AI Analysis

This work addresses the challenge of automated annotation for large-scale single-cell transcriptomics data in developmental biology, but it is incremental as it builds on existing clustering methods.

The paper tackles the problem of annotating single-cell transcriptomic profiles in developmental biology by addressing the Novel Class Discovery setting, where labeled and unlabeled data have disjoint label sets, and proposes extensions of k-Means and GMM clustering methods that leverage hierarchical data structures, reporting comparative results on artificial and experimental datasets.

One of the major challenges arising from single-cell transcriptomics experiments is the question of how to annotate the associated single-cell transcriptomic profiles. Because of the large size and the high dimensionality of the data, automated methods for annotation are needed. We focus here on datasets obtained in the context of developmental biology, where the differentiation process leads to a hierarchical structure. We consider a frequent setting where both labeled and unlabeled data are available at training time, but the sets of the labels of labeled data on one side and of the unlabeled data on the other side, are disjoint. It is an instance of the Novel Class Discovery problem. The goal is to achieve two objectives, clustering the data and mapping the clusters with labels. We propose extensions of k-Means and GMM clustering methods for solving the problem and report comparative results on artificial and experimental transcriptomic datasets. Our approaches take advantage of the hierarchical nature of the data.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes