CVMTRL-SCIIVOct 20, 2024

Upsampling DINOv2 features for unsupervised vision tasks and weakly supervised materials segmentation

arXiv:2410.19836v29 citationsh-index: 3Adv Intell Syst
Originality Synthesis-oriented
AI Analysis

This work addresses materials characterization problems for researchers, but is incremental as it builds on existing self-supervised features with standard methods.

The paper tackles unsupervised vision tasks and weakly supervised materials segmentation by leveraging upsampled features from self-supervised vision transformers like DINOv2, achieving strong performance on benchmarks, particularly in weakly supervised segmentation where these features capture complex relationships.

The features of self-supervised vision transformers (ViTs) contain strong semantic and positional information relevant to downstream tasks like object localization and segmentation. Recent works combine these features with traditional methods like clustering, graph partitioning or region correlations to achieve impressive baselines without finetuning or training additional networks. We leverage upsampled features from ViT networks (e.g DINOv2) in two workflows: in a clustering based approach for object localization and segmentation, and paired with standard classifiers in weakly supervised materials segmentation. Both show strong performance on benchmarks, especially in weakly supervised segmentation where the ViT features capture complex relationships inaccessible to classical approaches. We expect the flexibility and generalizability of these features will both speed up and strengthen materials characterization, from segmentation to property-prediction.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes