CVOct 26, 2025

Seeing the Unseen: Towards Zero-Shot Inspection for Wind Turbine Blades using Knowledge-Augmented Vision Language Models

Yang Zhang, Qianyu Zhou, Farhad Imani, Jiong Tang

arXiv:2510.22868v1h-index: 5

Originality Incremental advance

AI Analysis

This provides a data-efficient solution for industrial inspection in wind energy, reducing dependence on extensive labeled datasets, though it is incremental as it builds on existing VLM and RAG methods.

The paper tackles the problem of detecting rare or evolving damage types in wind turbine blades without large labeled datasets by proposing a zero-shot inspection framework that integrates Retrieval-Augmented Generation with Vision-Language Models, achieving correct classification on all 30 test samples.

Wind turbine blades operate in harsh environments, making timely damage detection essential for preventing failures and optimizing maintenance. Drone-based inspection and deep learning are promising, but typically depend on large, labeled datasets, which limit their ability to detect rare or evolving damage types. To address this, we propose a zero-shot-oriented inspection framework that integrates Retrieval-Augmented Generation (RAG) with Vision-Language Models (VLM). A multimodal knowledge base is constructed, comprising technical documentation, representative reference images, and domain-specific guidelines. A hybrid text-image retriever with keyword-aware reranking assembles the most relevant context to condition the VLM at inference, injecting domain knowledge without task-specific training. We evaluate the framework on 30 labeled blade images covering diverse damage categories. Although the dataset is small due to the difficulty of acquiring verified blade imagery, it covers multiple representative defect types. On this test set, the RAG-grounded VLM correctly classified all samples, whereas the same VLM without retrieval performed worse in both accuracy and precision. We further compare against open-vocabulary baselines and incorporate uncertainty Clopper-Pearson confidence intervals to account for the small-sample setting. Ablation studies indicate that the key advantage of the framework lies in explainability and generalizability: retrieved references ground the reasoning process and enable the detection of previously unseen defects by leveraging domain knowledge rather than relying solely on visual cues. This research contributes a data-efficient solution for industrial inspection that reduces dependence on extensive labeled datasets.

View on arXiv PDF

Similar