CLAug 13, 2024

IFShip: Interpretable Fine-grained Ship Classification with Domain Knowledge-Enhanced Vision-Language Models

Mingning Guo, Mengwei Wu, Yuxiang Shen, Haifeng Li, Chao Tao

arXiv:2408.06631v44.810 citationsh-index: 5Has Code

Originality Incremental advance

AI Analysis

This work addresses the interpretability gap in fine-grained ship classification for remote sensing applications, offering a semi-automated solution with a publicly available dataset, though it appears incremental as it adapts existing vision-language models to a specific domain.

The paper tackles the problem of uninterpretable 'black box' models in remote sensing fine-grained ship classification by proposing IFShip, a domain knowledge-enhanced vision-language model that redefines classification as a step-by-step reasoning task with natural language explanations. The result shows that IFShip outperforms state-of-the-art FGSC algorithms in both interpretability and classification accuracy, and it provides accurate reasoning when ship types are recognizable to humans and interpretable explanations when they are not.

End-to-end interpretation currently dominates the remote sensing fine-grained ship classification (RS-FGSC) task. However, the inference process remains uninterpretable, leading to criticisms of these models as "black box" systems. To address this issue, we propose a domain knowledge-enhanced Chain-of-Thought (CoT) prompt generation mechanism, which is used to semi-automatically construct a task-specific instruction-following dataset, TITANIC-FGS. By training on TITANIC-FGS, we adapt general-domain vision-language models (VLMs) to the FGSC task, resulting in a model named IFShip. Building upon IFShip, we develop an FGSC visual chatbot that redefines the FGSC problem as a step-by-step reasoning task and conveys the reasoning process in natural language. Experimental results show that IFShip outperforms state-of-the-art FGSC algorithms in both interpretability and classification accuracy. Furthermore, compared to VLMs such as LLaVA and MiniGPT-4, IFShip demonstrates superior performance on the FGSC task. It provides an accurate chain of reasoning when fine-grained ship types are recognizable to the human eye and offers interpretable explanations when they are not. Our dataset is publicly available at: https://github.com/lostwolves/IFShip.

View on arXiv PDF Code

Similar