CVApr 13

Boxes2Pixels: Learning Defect Segmentation from Noisy SAM Masks

arXiv:2604.1116222.5h-index: 4Has Code
AI Analysis

It provides a practical solution for industrial defect segmentation where pixel-level annotations are scarce, improving robustness to noisy pseudo-labels.

Boxes2Pixels addresses noisy pseudo-masks from SAM for defect segmentation in industrial inspection, achieving +6.97 anomaly mIoU and +9.71 binary IoU over baselines on a wind turbine benchmark, with 80% fewer parameters.

Accurate defect segmentation is critical for industrial inspection, yet dense pixel-level annotations are rarely available. A common workaround is to convert inexpensive bounding boxes into pseudo-masks using foundation segmentation models such as the Segment Anything Model (SAM). However, these pseudo-labels are systematically noisy on industrial surfaces, often hallucinating background structure while missing sparse defects. To address this limitation, a noise-robust box-to-pixel distillation framework, Boxes2Pixels, is proposed that treats SAM as a noisy teacher rather than a source of ground-truth supervision. Bounding boxes are converted into pseudo-masks offline by SAM, and a compact student is trained with (i) a hierarchical decoder over frozen DINOv2 features for semantic stability, (ii) an auxiliary binary localization head to decouple sparse foreground discovery from class prediction, and (iii) a one-sided online self-correction mechanism that relaxes background supervision when the student is confident, targeting teacher false negatives. On a manually annotated wind turbine inspection benchmark, the proposed Boxes2Pixels improves anomaly mIoU by +6.97 and binary IoU by +9.71 over the strongest baseline trained under identical weak supervision. Moreover, online self-correction increases the binary recall by +18.56, while the model employs 80\% fewer trainable parameters. Code is available at https://github.com/CLendering/Boxes2Pixels.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes