CVGEO-PHJul 25, 2024

DINOv2 Rocks Geological Image Analysis: Classification, Segmentation, and Interpretability

arXiv:2407.18100v36 citationsh-index: 2
Originality Synthesis-oriented
AI Analysis

This work addresses the problem of domain adaptation for geoscientists, but it is incremental as it applies existing methods like DINOv2 to a new domain without introducing novel techniques.

This study tackled the challenge of applying computer vision to geological CT-scan images, where data is scarce and models often struggle outside their training distribution. It found that DINOv2, especially when fine-tuned with LoRA, excelled in classification and segmentation tasks, outperforming other methods in multi-class segmentation and showing strong generalization with limited data.

Recent advancements in computer vision have significantly improved image analysis tasks. Yet, deep learning models often struggle when applied to domains outside their training distribution, such as in geosciences, where domain-specific data can be scarce. This study investigates the classification, segmentation, and interpretability of CT-scan images of rock samples, focusing on the application of modern computer vision techniques to geoscientific tasks. We compare a range of segmentation methods to assess their efficacy, efficiency, and adaptability in geological image analysis. The methods evaluated include Otsu thresholding, clustering techniques (K-means, fuzzy C-means), a supervised machine learning approach (Random Forest), and deep learning models (UNet, ResNet152, and DINOv2), using ten binary sandstone datasets and three multi-class calcite datasets. DINOv2 was selected for its promising results in feature extraction and its potential applicability in geoscientific tasks, prompting further assessment of its interpretability and effectiveness in processing CT-scanned rock data. For classification, a non-fine-tuned DINOv2 demonstrates strong performance in classifying rock images, even when the CT-scans are outside its original training set. In segmentation tasks, thresholding and clustering techniques, though computationally efficient, produce subpar results despite preprocessing efforts. In contrast, supervised methods achieve better performance. While deep learning methods demand greater computational resources, they require minimal intervention and offer superior generalization. A LoRA fine-tuned DINOv2, in particular, excels in out-of-distribution segmentation and outperforms other methods in multi-class tasks, even with limited data. Notably, the segmentation masks generated by DINOv2 often appear more accurate than the original targets, based on visual inspection.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes