QM LGJun 10, 2025

scSSL-Bench: Benchmarking Self-Supervised Learning for Single-Cell Data

Olga Ovcharenko, Florian Barkmann, Philip Toma, Imant Daunhawer, Julia Vogt, Sebastian Schelter, Valentina Boeva

arXiv:2506.10031v11.22 citationsh-index: 11Has CodeICML

Originality Synthesis-oriented

AI Analysis

This work provides a standardized benchmark for researchers in single-cell genomics to guide the application of self-supervised learning methods, though it is incremental as it focuses on evaluation rather than introducing new methods.

The authors tackled the problem of evaluating self-supervised learning methods for single-cell data by creating scSSL-Bench, a benchmark that tested nineteen methods across nine datasets and three tasks, finding that specialized frameworks like scVI excel at batch correction while generic methods like VICReg perform better in cell typing and multi-modal integration.

Self-supervised learning (SSL) has proven to be a powerful approach for extracting biologically meaningful representations from single-cell data. To advance our understanding of SSL methods applied to single-cell data, we present scSSL-Bench, a comprehensive benchmark that evaluates nineteen SSL methods. Our evaluation spans nine datasets and focuses on three common downstream tasks: batch correction, cell type annotation, and missing modality prediction. Furthermore, we systematically assess various data augmentation strategies. Our analysis reveals task-specific trade-offs: the specialized single-cell frameworks, scVI, CLAIRE, and the finetuned scGPT excel at uni-modal batch correction, while generic SSL methods, such as VICReg and SimCLR, demonstrate superior performance in cell typing and multi-modal data integration. Random masking emerges as the most effective augmentation technique across all tasks, surpassing domain-specific augmentations. Notably, our results indicate the need for a specialized single-cell multi-modal data integration framework. scSSL-Bench provides a standardized evaluation platform and concrete recommendations for applying SSL to single-cell analysis, advancing the convergence of deep learning and single-cell genomics.

View on arXiv PDF Code

Similar