CVMay 6
UniPCB: A Generation-Assisted Detection Framework for PCB Defect InspectionHuan Zhang, Lianghong Tan, Yichu Xu et al.
Printed Circuit Board (PCB) defect inspection faces two compounding challenges: scarce and imbalanced defect samples that limit model training, and insufficient feature representation under complex circuit backgrounds. Existing generation methods rely on single-modality conditions with coarse structural control, while detection methods improve architectures without addressing the data bottleneck. To resolve both challenges jointly, we propose a generation-assisted PCB defect inspection framework that integrates controlled defect synthesis with task-specific defect detection. On the generation side, a Multi-modal Condition Generator extracts complementary edge, depth, and text conditions in parallel. A ScaleEncoder then embeds these conditions into the diffusion U-Net at four resolutions, and a Condition Modulation applies FiLM-style spatially-adaptive modulation at each scale, enabling structurally aligned and defect-aware sample synthesis. On the detection side, an Inverted Residual Shift Attention couples self-attention with shift-wise convolution to jointly capture global context and local texture, and a Cross-level Complementary Fusion Block generates pixel-level gates for selective cross-level feature fusion. The synthesized samples directly enrich the detection training set, so that improvements in generation compound with improvements in detection. Extensive experiments on DsPCBSD+ demonstrate that UniPCB achieves mAP@0.5 of 98.0% and mAP@0.5:0.95 of 61.8% on defect detection, surpassing all compared methods, while the generation branch attains an FID of 129.61 and SSIM of 0.619, outperforming existing conditional generation approaches.
CVApr 29, 2025
Unveiling the Underwater World: CLIP Perception Model-Guided Underwater Image EnhancementJiangzhong Cao, Zekai Zeng, Xu Zhang et al.
High-quality underwater images are essential for both machine vision tasks and viewers with their aesthetic appeal.However, the quality of underwater images is severely affected by light absorption and scattering. Deep learning-based methods for Underwater Image Enhancement (UIE) have achieved good performance. However, these methods often overlook considering human perception and lack sufficient constraints within the solution space. Consequently, the enhanced images often suffer from diminished perceptual quality or poor content restoration.To address these issues, we propose a UIE method with a Contrastive Language-Image Pre-Training (CLIP) perception loss module and curriculum contrastive regularization. Above all, to develop a perception model for underwater images that more aligns with human visual perception, the visual semantic feature extraction capability of the CLIP model is leveraged to learn an appropriate prompt pair to map and evaluate the quality of underwater images. This CLIP perception model is then incorporated as a perception loss module into the enhancement network to improve the perceptual quality of enhanced images. Furthermore, the CLIP perception model is integrated with the curriculum contrastive regularization to enhance the constraints imposed on the enhanced images within the CLIP perceptual space, mitigating the risk of both under-enhancement and over-enhancement. Specifically, the CLIP perception model is employed to assess and categorize the learning difficulty level of negatives in the regularization process, ensuring comprehensive and nuanced utilization of distorted images and negatives with varied quality levels. Extensive experiments demonstrate that our method outperforms state-of-the-art methods in terms of visual quality and generalization ability.
CVJul 4, 2025
MRC-DETR: An Adaptive Multi-Residual Coupled Transformer for Bare Board PCB Defect DetectionJiangzhong Cao, Huanqi Wu, Xu Zhang et al.
In modern electronic manufacturing, defect detection on Printed Circuit Boards (PCBs) plays a critical role in ensuring product yield and maintaining the reliability of downstream assembly processes. However, existing methods often suffer from limited feature representation, computational redundancy, and insufficient availability of high-quality training data -- challenges that hinder their ability to meet industrial demands for both accuracy and efficiency. To address these limitations, we propose MRC-DETR, a novel and efficient detection framework tailored for bare PCB defect inspection, built upon the foundation of RT-DETR. Firstly, to enhance feature representation capability, we design a Multi-Residual Directional Coupled Block (MRDCB). This module improves channel-wise feature interaction through a multi-residual structure. Moreover, a cross-spatial learning strategy is integrated to capture fine-grained pixel-level relationships, further enriching the representational power of the extracted features. Secondly, to reduce computational redundancy caused by inefficient cross-layer information fusion, we introduce an Adaptive Screening Pyramid Network (ASPN). This component dynamically filters and aggregates salient low-level features, selectively fusing them with high-level semantic features. By focusing on informative regions and suppressing redundant computations, ASPN significantly improves both efficiency and detection accuracy. Finally, to tackle the issue of insufficient training data, particularly in the context of bare PCBs, we construct a new, high-quality dataset that fills a critical gap in current public resources. Our dataset not only supports the training and evaluation of our proposed framework but also serves as a valuable benchmark for future research in this domain.
CVJun 23, 2021
PatentNet: A Large-Scale Incomplete Multiview, Multimodal, Multilabel Industrial Goods Image DatabaseFangyuan Lei, Da Huang, Jianjian Jiang et al.
In deep learning area, large-scale image datasets bring a breakthrough in the success of object recognition and retrieval. Nowadays, as the embodiment of innovation, the diversity of the industrial goods is significantly larger, in which the incomplete multiview, multimodal and multilabel are different from the traditional dataset. In this paper, we introduce an industrial goods dataset, namely PatentNet, with numerous highly diverse, accurate and detailed annotations of industrial goods images, and corresponding texts. In PatentNet, the images and texts are sourced from design patent. Within over 6M images and corresponding texts of industrial goods labeled manually checked by professionals, PatentNet is the first ongoing industrial goods image database whose varieties are wider than industrial goods datasets used previously for benchmarking. PatentNet organizes millions of images into 32 classes and 219 subclasses based on the Locarno Classification Agreement. Through extensive experiments on image classification, image retrieval and incomplete multiview clustering, we demonstrate that our PatentNet is much more diverse, complex, and challenging, enjoying higher potentials than existing industrial image datasets. Furthermore, the characteristics of incomplete multiview, multimodal and multilabel in PatentNet are able to offer unparalleled opportunities in the artificial intelligence community and beyond.
NENov 22, 2018
Conditioning Optimization of Extreme Learning Machine by Multitask Beetle Antennae Swarm AlgorithmXixian Zhang, Zhijing Yang, Faxian Cao et al.
Extreme learning machine (ELM) as a simple and rapid neural network has been shown its good performance in various areas. Different from the general single hidden layer feedforward neural network (SLFN), the input weights and biases in hidden layer of ELM are generated randomly, so that it only takes a little computation overhead to train the model. However, the strategy of selecting input weights and biases at random may result in ill-posed problem. Aiming to optimize the conditioning of ELM, we propose an effective particle swarm heuristic algorithm called Multitask Beetle Antennae Swarm Algorithm (MBAS), which is inspired by the structures of artificial bee colony (ABS) algorithm and Beetle Antennae Search (BAS) algorithm. Then, the proposed MBAS is applied to optimize the input weights and biases of ELM. Experiment results show that the proposed method is capable of simultaneously reducing the condition number and regression error, and achieving good generalization performances.