CVAISep 2, 2025

Unsupervised Training of Vision Transformers with Synthetic Negatives

arXiv:2509.02024v1h-index: 8
Originality Synthesis-oriented
AI Analysis

This addresses a neglected aspect in self-supervised learning for vision transformers, but it is incremental as it builds on existing synthetic negative techniques.

The paper tackles the problem of improving vision transformer representation learning by integrating synthetic hard negatives, resulting in performance improvements for DeiT-S and Swin-T architectures.

This paper does not introduce a novel method per se. Instead, we address the neglected potential of hard negative samples in self-supervised learning. Previous works explored synthetic hard negatives but rarely in the context of vision transformers. We build on this observation and integrate synthetic hard negatives to improve vision transformer representation learning. This simple yet effective technique notably improves the discriminative power of learned representations. Our experiments show performance improvements for both DeiT-S and Swin-T architectures.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes