CVAug 27, 2024Code
CVPT: Cross Visual Prompt TuningLingyun Huang, Jianxu Mao, Junfei Yi et al.
Parameter-Efficient Fine-Tuning (PEFT) has emerged to mitigate the computational demands of large-scale models. Within computer vision, adapter-based PEFT methods are often favored over prompt-based approaches like Visual Prompt Tuning (VPT) due to the latter's performance and efficiency limitations. Our analysis reveals that VPT's shortcomings stem from its prompt deployment strategy, which can distort the model's inherent self-attention mechanism. To address this, we propose Cross Visual Prompt Tuning (CVPT). CVPT introduces a cross-attention module to directly model interactions between prompts and image tokens. This design decouples the prompts from the input sequence, preserving the original self-attention integrity while enabling efficient feature integration. Furthermore, we employ a weight-sharing mechanism for cross-attention initialization, which enhances representative capability without a large parameter overhead. Extensive experiments across 25 datasets show that CVPT significantly outperforms VPT. For instance, on the VTAB-1K benchmark, CVPT achieves over 4% higher average accuracy, rivaling leading adapter-based methods in both performance and efficiency. Our work confirms that prompt-based methods can achieve exceptional results in visual fine-tuning. The code is available at https://github.com/Lingyun0419/CVPT
CVJun 11, 2024
Teaching with Uncertainty: Unleashing the Potential of Knowledge Distillation in Object DetectionJunfei Yi, Jianxu Mao, Tengfei Liu et al.
Knowledge distillation (KD) is a widely adopted and effective method for compressing models in object detection tasks. Particularly, feature-based distillation methods have shown remarkable performance. Existing approaches often ignore the uncertainty in the teacher model's knowledge, which stems from data noise and imperfect training. This limits the student model's ability to learn latent knowledge, as it may overly rely on the teacher's imperfect guidance. In this paper, we propose a novel feature-based distillation paradigm with knowledge uncertainty for object detection, termed "Uncertainty Estimation-Discriminative Knowledge Extraction-Knowledge Transfer (UET)", which can seamlessly integrate with existing distillation methods. By leveraging the Monte Carlo dropout technique, we introduce knowledge uncertainty into the training process of the student model, facilitating deeper exploration of latent knowledge. Our method performs effectively during the KD process without requiring intricate structures or extensive computational resources. Extensive experiments validate the effectiveness of our proposed approach across various distillation strategies, detectors, and backbone architectures. Specifically, following our proposed paradigm, the existing FGD method achieves state-of-the-art (SoTA) performance, with ResNet50-based GFL achieving 44.1% mAP on the COCO dataset, surpassing the baselines by 3.9%.
SPMay 30, 2023
A Graph Reconstruction by Dynamic Signal Coefficient for Fault ClassificationWenbin He, Jianxu Mao, Yaonan Wang et al.
To improve the performance in identifying the faults under strong noise for rotating machinery, this paper presents a dynamic feature reconstruction signal graph method, which plays the key role of the proposed end-to-end fault diagnosis model. Specifically, the original mechanical signal is first decomposed by wavelet packet decomposition (WPD) to obtain multiple subbands including coefficient matrix. Then, with originally defined two feature extraction factors MDD and DDD, a dynamic feature selection method based on L2 energy norm (DFSL) is proposed, which can dynamically select the feature coefficient matrix of WPD based on the difference in the distribution of norm energy, enabling each sub-signal to take adaptive signal reconstruction. Next the coefficient matrices of the optimal feature sub-bands are reconstructed and reorganized to obtain the feature signal graphs. Finally, deep features are extracted from the feature signal graphs by 2D-Convolutional neural network (2D-CNN). Experimental results on a public data platform of a bearing and our laboratory platform of robot grinding show that this method is better than the existing methods under different noise intensities.
CVJun 11, 2020
Minimum Potential Energy of Point Cloud for Robust Global RegistrationZijie Wu, Yaonan Wang, Qing Zhu et al.
In this paper, we propose a novel minimum gravitational potential energy (MPE)-based algorithm for global point set registration. The feature descriptors extraction algorithms have emerged as the standard approach to align point sets in the past few decades. However, the alignment can be challenging to take effect when the point set suffers from raw point data problems such as noises (Gaussian and Uniformly). Different from the most existing point set registration methods which usually extract the descriptors to find correspondences between point sets, our proposed MPE alignment method is able to handle large scale raw data offset without depending on traditional descriptors extraction, whether for the local or global registration methods. We decompose the solution into a global optimal convex approximation and the fast descent process to a local minimum. For the approximation step, the proposed minimum potential energy (MPE) approach consists of two main steps. Firstly, according to the construction of the force traction operator, we could simply compute the position of the potential energy minimum; Secondly, with respect to the finding of the MPE point, we propose a new theory that employs the two flags to observe the status of the registration procedure. The method of fast descent process to the minimum that we employed is the iterative closest point algorithm; it can achieve the global minimum. We demonstrate the performance of the proposed algorithm on synthetic data as well as on real data. The proposed method outperforms the other global methods in terms of both efficiency, accuracy and noise resistance.