CVMar 17, 2024

LERENet: Eliminating Intra-class Differences for Metal Surface Defect Few-shot Semantic Segmentation

arXiv:2403.11122v12 citationsh-index: 1
Originality Incremental advance
AI Analysis

This addresses data scarcity and segmentation accuracy for metal defect detection in industrial applications, representing an incremental improvement over existing methods.

The paper tackled the problem of intra-class differences in metal surface defect few-shot semantic segmentation by proposing LERENet, which uses local and global information fusion to improve segmentation, achieving new state-of-the-art results on defect datasets.

Few-shot segmentation models excel in metal defect detection due to their rapid generalization ability to new classes and pixel-level segmentation, rendering them ideal for addressing data scarcity issues and achieving refined object delineation in industrial applications. Existing works neglect the \textit{Intra-Class Differences}, inherent in metal surface defect data, which hinders the model from learning sufficient knowledge from the support set to guide the query set segmentation. Specifically, it can be categorized into two types: the \textit{Semantic Difference} induced by internal factors in metal samples and the \textit{Distortion Difference} caused by external factors of surroundings. To address these differences, we introduce a \textbf{L}ocal d\textbf{E}scriptor based \textbf{R}easoning and \textbf{E}xcitation \textbf{Net}work (\textbf{LERENet}) to learn the two-view guidance, i.e., local and global information from the graph and feature space, and fuse them to segment precisely. Since the relation structure of local features embedded in graph space will help to eliminate \textit{Semantic Difference}, we employ Multi-Prototype Reasoning (MPR) module, extracting local descriptors based prototypes and analyzing local-view feature relevance in support-query pairs. Besides, due to the global information that will assist in countering the \textit{Distortion Difference} in observations, we utilize Multi-Prototype Excitation (MPE) module to capture the global-view relations in support-query pairs. Finally, we employ an Information Fusion Module (IFM) to fuse learned prototypes in local and global views to generate pixel-level masks. Our comprehensive experiments on defect datasets demonstrate that it outperforms existing benchmarks, establishing a new state-of-the-art.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes