CVSep 29, 2025

FishNet++: Analyzing the capabilities of Multimodal Large Language Models in marine biology

Faizan Farooq Khan, Yousef Radwan, Eslam Abdelrahman, Abdulwahab Felemban, Aymen Mir, Nico K. Michiels, Andrew J. Temple, Michael L. Berumen, Mohamed Elhoseiny

arXiv:2509.25564v11 citationsh-index: 24Has Code

Originality Synthesis-oriented

AI Analysis

This work addresses the need for better MLLM capabilities in marine biology, which is critical for monitoring ecosystems under anthropogenic pressure, but it is incremental as it extends existing resources with a new benchmark.

The paper tackled the problem of evaluating multimodal large language models (MLLMs) in marine biology, revealing that state-of-the-art models achieve less than 10% accuracy in fine-grained fish species recognition. To address this, they introduced FishNet++, a large-scale benchmark with extensive annotations to facilitate specialized model development.

Multimodal large language models (MLLMs) have demonstrated impressive cross-domain capabilities, yet their proficiency in specialized scientific fields like marine biology remains underexplored. In this work, we systematically evaluate state-of-the-art MLLMs and reveal significant limitations in their ability to perform fine-grained recognition of fish species, with the best open-source models achieving less than 10\% accuracy. This task is critical for monitoring marine ecosystems under anthropogenic pressure. To address this gap and investigate whether these failures stem from a lack of domain knowledge, we introduce FishNet++, a large-scale, multimodal benchmark. FishNet++ significantly extends existing resources with 35,133 textual descriptions for multimodal learning, 706,426 key-point annotations for morphological studies, and 119,399 bounding boxes for detection. By providing this comprehensive suite of annotations, our work facilitates the development and evaluation of specialized vision-language models capable of advancing aquatic science.

View on arXiv PDF

Similar