CVAug 28, 2024

VLM4Bio: A Benchmark Dataset to Evaluate Pretrained Vision-Language Models for Trait Discovery from Biological Images

Microsoft
arXiv:2408.16176v115 citationsh-index: 42Has Code
Originality Synthesis-oriented
AI Analysis

This work addresses the need for biologists to leverage VLMs for automated analysis of biodiversity images, but it is incremental as it primarily benchmarks existing models on a new dataset.

The authors tackled the problem of evaluating pre-trained vision-language models (VLMs) for trait discovery from biological images without fine-tuning, using a novel dataset (VLM4Bio) with 469K question-answer pairs across 30K images, and found that current SOTA VLMs show varying effectiveness, with insights from prompting techniques and hallucination tests.

Images are increasingly becoming the currency for documenting biodiversity on the planet, providing novel opportunities for accelerating scientific discoveries in the field of organismal biology, especially with the advent of large vision-language models (VLMs). We ask if pre-trained VLMs can aid scientists in answering a range of biologically relevant questions without any additional fine-tuning. In this paper, we evaluate the effectiveness of 12 state-of-the-art (SOTA) VLMs in the field of organismal biology using a novel dataset, VLM4Bio, consisting of 469K question-answer pairs involving 30K images from three groups of organisms: fishes, birds, and butterflies, covering five biologically relevant tasks. We also explore the effects of applying prompting techniques and tests for reasoning hallucination on the performance of VLMs, shedding new light on the capabilities of current SOTA VLMs in answering biologically relevant questions using images. The code and datasets for running all the analyses reported in this paper can be found at https://github.com/sammarfy/VLM4Bio.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes