CVAIOct 24, 2025

Randomized-MLP Regularization Improves Domain Adaptation and Interpretability in DINOv2

arXiv:2511.05509v11 citationsh-index: 15
Originality Incremental advance
AI Analysis

This addresses domain adaptation and interpretability issues in vision transformers for applications like medical imaging, but it is incremental as it builds on existing DINOv2 and contrastive learning methods.

The paper tackled the problem of vision transformers like DINOv2 repurposing low-informative patch tokens, which reduces interpretability, especially under domain shifts in medical imaging, by introducing Randomized-MLP regularization to improve or maintain performance while producing more interpretable attention maps.

Vision Transformers (ViTs), such as DINOv2, achieve strong performance across domains but often repurpose low-informative patch tokens in ways that reduce the interpretability of attention and feature maps. This challenge is especially evident in medical imaging, where domain shifts can degrade both performance and transparency. In this paper, we introduce Randomized-MLP (RMLP) regularization, a contrastive learning-based method that encourages more semantically aligned representations. We use RMLPs when fine-tuning DINOv2 to both medical and natural image modalities, showing that it improves or maintains downstream performance while producing more interpretable attention maps. We also provide a mathematical analysis of RMLPs, offering insights into its role in enhancing ViT-based models and advancing our understanding of contrastive learning.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes