SMR-Net:Robot Snap Detection Based on Multi-Scale Features and Self-Attention Network
This addresses precision and efficiency challenges in robot assembly for manufacturing, though it appears incremental as it builds on existing object detection methods with specific enhancements.
The paper tackles the problem of detecting and localizing snaps in robot automated assembly, where traditional visual methods struggle with complex scenarios like transparent or low-contrast snaps. The proposed SMR-Net algorithm improves Intersection over Union (IoU) by 6.52% and 5.8% and mean Average Precision (mAP) by 2.8% and 1.5% on two datasets compared to Faster R-CNN.
In robot automated assembly, snap assembly precision and efficiency directly determine overall production quality. As a core prerequisite, snap detection and localization critically affect subsequent assembly success. Traditional visual methods suffer from poor robustness and large localization errors when handling complex scenarios (e.g., transparent or low-contrast snaps), failing to meet high-precision assembly demands. To address this, this paper designs a dedicated sensor and proposes SMR-Net, an self-attention-based multi-scale object detection algorithm, to synergistically enhance detection and localization performance. SMR-Net adopts an attention-enhanced multi-scale feature fusion architecture: raw sensor data is encoded via an attention-embedded feature extractor to strengthen key snap features and suppress noise; three multi-scale feature maps are processed in parallel with standard and dilated convolution for dimension unification while preserving resolution; an adaptive reweighting network dynamically assigns weights to fused features, generating fine representations integrating details and global semantics. Experimental results on Type A and Type B snap datasets show SMR-Net outperforms traditional Faster R-CNN significantly: Intersection over Union (IoU) improves by 6.52% and 5.8%, and mean Average Precision (mAP) increases by 2.8% and 1.5% respectively. This fully demonstrates the method's superiority in complex snap detection and localization tasks.