CVFeb 26, 2025

FungalZSL: Zero-Shot Fungal Classification with Image Captioning Using a Synthetic Data Approach

arXiv:2502.19038v11 citationsh-index: 16ACPR
Originality Synthesis-oriented
AI Analysis

This work addresses a domain-specific challenge in fungal biology by providing an incremental method to improve zero-shot learning with synthetic data.

The paper tackled the problem of zero-shot fungal classification by generating synthetic text and image datasets to enhance CLIP's capabilities, resulting in improved classification performance across fungal growth stages.

The effectiveness of zero-shot classification in large vision-language models (VLMs), such as Contrastive Language-Image Pre-training (CLIP), depends on access to extensive, well-aligned text-image datasets. In this work, we introduce two complementary data sources, one generated by large language models (LLMs) to describe the stages of fungal growth and another comprising a diverse set of synthetic fungi images. These datasets are designed to enhance CLIPs zero-shot classification capabilities for fungi-related tasks. To ensure effective alignment between text and image data, we project them into CLIPs shared representation space, focusing on different fungal growth stages. We generate text using LLaMA3.2 to bridge modality gaps and synthetically create fungi images. Furthermore, we investigate knowledge transfer by comparing text outputs from different LLM techniques to refine classification across growth stages.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes