CVFeb 5

Generalization of Self-Supervised Vision Transformers for Protein Localization Across Microscopy Domains

Ben Isselmann, Dilara Göksu, Andreas Weinmann

arXiv:2602.05527v21 citationsh-index: 1

Originality Incremental advance

AI Analysis

This work addresses the problem of limited labeled data for training deep learning models in microscopy, which is a common bottleneck for biologists and medical researchers.

This paper investigates the cross-domain transferability of DINO-pretrained Vision Transformers for protein localization, finding that a microscopy-specific HPA-pretrained model achieved the best performance with a mean macro F1-score of 0.8221, slightly outperforming a model trained directly on the target dataset (0.8057). This demonstrates that domain-relevant self-supervised learning representations can generalize effectively to distinct microscopy datasets.

Task-specific microscopy datasets are often too small to train deep learning models that learn robust feature representations. Self-supervised learning (SSL) can mitigate this by pretraining on large unlabeled datasets, but it remains unclear how well such representations transfer across microscopy domains with different staining protocols and channel configurations. We investigate the cross-domain transferability of DINO-pretrained Vision Transformers for protein localization on the OpenCell dataset. We generate image embeddings using three DINO backbones pretrained on ImageNet-1k, the Human Protein Atlas (HPA), and OpenCell, and evaluate them by training a supervised classification head on OpenCell labels. All pretrained models transfer well, with the microscopy-specific HPA-pretrained model achieving the best performance (mean macro $F_1$-score = 0.8221 $\pm$ 0.0062), slightly outperforming a DINO model trained directly on OpenCell (0.8057 $\pm$ 0.0090). These results highlight the value of large-scale pretraining and indicate that domain-relevant SSL representations can generalize effectively to related but distinct microscopy datasets, enabling strong downstream performance even when task-specific labeled data are limited.

View on arXiv PDF

Similar