Synthetic Defect Image Generation for Power Line Insulator Inspection Using Multimodal Large Language Models
This work provides a practical, low-barrier method for utility companies to improve defect recognition in power line insulator inspection, especially when collecting additional real defect data is challenging or slow.
This paper tackles the problem of data scarcity in power line insulator defect inspection by using a multimodal large language model (MLLM) to synthesize defect images. Augmenting a small real training set (104 images) with embedding-selected synthetic images improved the test F1 score from 0.615 to 0.739, a 20% relative increase, corresponding to an estimated 4-5x data-efficiency gain.
Utility companies increasingly rely on drone imagery for post-event and routine inspection, but training accurate defect-type classifiers remains difficult because defect examples are rare and inspection datasets are often limited or proprietary. We address this data-scarcity setting by using an off-the-shelf multimodal large language model (MLLM) as a training-free image generator to synthesize defect images from visual references and text prompts. Our pipeline increases diversity via dual-reference conditioning, improves label fidelity with lightweight human verification and prompt refinement, and filters the resulting synthetic pool using an embedding-based selection rule based on distances to class centroids computed from the real training split. We evaluate on ceramic insulator defect-type classification (shell vs. glaze) using a public dataset with a realistic low training-data regime (104 real training images; 152 validation; 308 test). Augmenting the 10% real training set with embedding-selected synthetic images improves test F1 score (harmonic mean of precision and recall) from 0.615 to 0.739 (20% relative), corresponding to an estimated 4--5x data-efficiency gain, and the gains persist with stronger backbone models and frozen-feature linear-probe baselines. These results suggest a practical, low-barrier path for improving defect recognition when collecting additional real defects is slow or infeasible.