Guangwu Qian

CV
h-index12
7papers
436citations
Novelty35%
AI Score44

7 Papers

CVFeb 20, 2023Code
Large-scale Multi-Modal Pre-trained Models: A Comprehensive Survey

Xiao Wang, Guangyao Chen, Guangwu Qian et al.

With the urgent demand for generalized deep models, many pre-trained big models are proposed, such as BERT, ViT, GPT, etc. Inspired by the success of these models in single domains (like computer vision and natural language processing), the multi-modal pre-trained big models have also drawn more and more attention in recent years. In this work, we give a comprehensive survey of these models and hope this paper could provide new insights and helps fresh researchers to track the most cutting-edge works. Specifically, we firstly introduce the background of multi-modal pre-training by reviewing the conventional deep learning, pre-training works in natural language process, computer vision, and speech. Then, we introduce the task definition, key challenges, and advantages of multi-modal pre-training models (MM-PTMs), and discuss the MM-PTMs with a focus on data, objectives, network architectures, and knowledge enhanced pre-training. After that, we introduce the downstream tasks used for the validation of large-scale MM-PTMs, including generative, classification, and regression tasks. We also give visualization and analysis of the model parameters and results on representative downstream tasks. Finally, we point out possible research directions for this topic that may benefit future works. In addition, we maintain a continuously updated paper list for large-scale pre-trained multi-modal big models: https://github.com/wangxiao5791509/MultiModal_BigModels_Survey. This paper has been published by the journal Machine Intelligence Research (MIR), https://link.springer.com/article/10.1007/s11633-022-1410-8, DOI: 10.1007/s11633-022-1410-8, vol. 20, no. 4, pp. 447-482, 2023.

CVDec 4, 2025Code
Boundary-Aware Test-Time Adaptation for Zero-Shot Medical Image Segmentation

Chenlin Xu, Lei Zhang, Lituan Wang et al.

Due to the scarcity of annotated data and the substantial computational costs of model, conventional tuning methods in medical image segmentation face critical challenges. Current approaches to adapting pretrained models, including full-parameter and parameter-efficient fine-tuning, still rely heavily on task-specific training on downstream tasks. Therefore, zero-shot segmentation has gained increasing attention, especially with foundation models such as SAM demonstrating promising generalization capabilities. However, SAM still faces notable limitations on medical datasets due to domain shifts, making efficient zero-shot enhancement an urgent research goal. To address these challenges, we propose BA-TTA-SAM, a task-agnostic test-time adaptation framework that significantly enhances the zero-shot segmentation performance of SAM via test-time adaptation. This framework integrates two key mechanisms: (1) The encoder-level Gaussian prompt injection embeds Gaussian-based prompts directly into the image encoder, providing explicit guidance for initial representation learning. (2) The cross-layer boundary-aware attention alignment exploits the hierarchical feature interactions within the ViT backbone, aligning deep semantic responses with shallow boundary cues. Experiments on four datasets, including ISIC, Kvasir, BUSI, and REFUGE, show an average improvement of 12.4\% in the DICE score compared with SAM's zero-shot segmentation performance. The results demonstrate that our method consistently outperforms state-of-the-art models in medical image segmentation. Our framework significantly enhances the generalization ability of SAM, without requiring any source-domain training data. Extensive experiments on publicly available medical datasets strongly demonstrate the superiority of our framework. Our code is available at https://github.com/Emilychenlin/BA-TTA-SAM.

CVNov 26, 2025Code
Self-Paced Learning for Images of Antinuclear Antibodies

Yiyang Jiang, Guangwu Qian, Jiaxin Wu et al.

Antinuclear antibody (ANA) testing is a crucial method for diagnosing autoimmune disorders, including lupus, Sjögren's syndrome, and scleroderma. Despite its importance, manual ANA detection is slow, labor-intensive, and demands years of training. ANA detection is complicated by over 100 coexisting antibody types, resulting in vast fluorescent pattern combinations. Although machine learning and deep learning have enabled automation, ANA detection in real-world clinical settings presents unique challenges as it involves multi-instance, multi-label (MIML) learning. In this paper, a novel framework for ANA detection is proposed that handles the complexities of MIML tasks using unaltered microscope images without manual preprocessing. Inspired by human labeling logic, it identifies consistent ANA sub-regions and assigns aggregated labels accordingly. These steps are implemented using three task-specific components: an instance sampler, a probabilistic pseudo-label dispatcher, and self-paced weight learning rate coefficients. The instance sampler suppresses low-confidence instances by modeling pattern confidence, while the dispatcher adaptively assigns labels based on instance distinguishability. Self-paced learning adjusts training according to empirical label observations. Our framework overcomes limitations of traditional MIML methods and supports end-to-end optimization. Extensive experiments on one ANA dataset and three public medical MIML benchmarks demonstrate the superiority of our framework. On the ANA dataset, our model achieves up to +7.0% F1-Macro and +12.6% mAP gains over the best prior method, setting new state-of-the-art results. It also ranks top-2 across all key metrics on public datasets, reducing Hamming loss and one-error by up to 18.2% and 26.9%, respectively. The source code can be accessed at https://github.com/fletcherjiang/ANA-SelfPacedLearning.

CVJan 31, 2025Code
Imagine with the Teacher: Complete Shape in a Multi-View Distillation Way

Zhanpeng Luo, Linna Wang, Guangwu Qian et al.

Point cloud completion aims to recover the completed 3D shape of an object from its partial observation caused by occlusion, sensor's limitation, noise, etc. When some key semantic information is lost in the incomplete point cloud, the neural network needs to infer the missing part based on the input information. Intuitively we would apply an autoencoder architecture to solve this kind of problem, which take the incomplete point cloud as input and is supervised by the ground truth. This process that develops model's imagination from incomplete shape to complete shape is done automatically in the latent space. But the knowledge for mapping from incomplete to complete still remains dark and could be further explored. Motivated by the knowledge distillation's teacher-student learning strategy, we design a knowledge transfer way for completing 3d shape. In this work, we propose a novel View Distillation Point Completion Network (VD-PCN), which solve the completion problem by a multi-view distillation way. The design methodology fully leverages the orderliness of 2d pixels, flexibleness of 2d processing and powerfulness of 2d network. Extensive evaluations on PCN, ShapeNet55/34, and MVP datasets confirm the effectiveness of our design and knowledge transfer strategy, both quantitatively and qualitatively. Committed to facilitate ongoing research, we will make our code publicly available.

SEDec 29, 2024
Enhancing Code LLMs with Reinforcement Learning in Code Generation: A Survey

Junqiao Wang, Zeng Zhang, Yangfan He et al.

With the rapid evolution of large language models (LLM), reinforcement learning (RL) has emerged as a pivotal technique for code generation and optimization in various domains. This paper presents a systematic survey of the application of RL in code optimization and generation, highlighting its role in enhancing compiler optimization, resource allocation, and the development of frameworks and tools. Subsequent sections first delve into the intricate processes of compiler optimization, where RL algorithms are leveraged to improve efficiency and resource utilization. The discussion then progresses to the function of RL in resource allocation, emphasizing register allocation and system optimization. We also explore the burgeoning role of frameworks and tools in code generation, examining how RL can be integrated to bolster their capabilities. This survey aims to serve as a comprehensive resource for researchers and practitioners interested in harnessing the power of RL to advance code generation and optimization techniques.

CLApr 27, 2024
Evaluating the Application of ChatGPT in Outpatient Triage Guidance: A Comparative Study

Dou Liu, Ying Han, Xiandi Wang et al.

The integration of Artificial Intelligence (AI) in healthcare presents a transformative potential for enhancing operational efficiency and health outcomes. Large Language Models (LLMs), such as ChatGPT, have shown their capabilities in supporting medical decision-making. Embedding LLMs in medical systems is becoming a promising trend in healthcare development. The potential of ChatGPT to address the triage problem in emergency departments has been examined, while few studies have explored its application in outpatient departments. With a focus on streamlining workflows and enhancing efficiency for outpatient triage, this study specifically aims to evaluate the consistency of responses provided by ChatGPT in outpatient guidance, including both within-version response analysis and between-version comparisons. For within-version, the results indicate that the internal response consistency for ChatGPT-4.0 is significantly higher than ChatGPT-3.5 (p=0.03) and both have a moderate consistency (71.2% for 4.0 and 59.6% for 3.5) in their top recommendation. However, the between-version consistency is relatively low (mean consistency score=1.43/3, median=1), indicating few recommendations match between the two versions. Also, only 50% top recommendations match perfectly in the comparisons. Interestingly, ChatGPT-3.5 responses are more likely to be complete than those from ChatGPT-4.0 (p=0.02), suggesting possible differences in information processing and response generation between the two versions. The findings offer insights into AI-assisted outpatient operations, while also facilitating the exploration of potentials and limitations of LLMs in healthcare utilization. Future research may focus on carefully optimizing LLMs and AI integration in healthcare systems based on ergonomic and human factors principles, precisely aligning with the specific needs of effective outpatient triage.

CVJan 21, 2022
Conceptor Learning for Class Activation Mapping

Guangwu Qian, Zhen-Qun Yang, Xu-Lu Zhang et al.

Class Activation Mapping (CAM) has been widely adopted to generate saliency maps which provides visual explanations for deep neural networks (DNNs). The saliency maps are conventionally generated by fusing the channels of the target feature map using a weighted average scheme. It is a weak model for the inter-channel relation, in the sense that it only models the relation among channels in a contrastive way (i.e., channels that play key roles in the prediction are given higher weights for them to stand out in the fusion). The collaborative relation, which makes the channels work together to provide cross reference, has been ignored. Furthermore, the model has neglected the intra-channel relation thoroughly.In this paper, we address this problem by introducing Conceptor learning into CAM generation. Conceptor leaning has been originally proposed to model the patterns of state changes in recurrent neural networks (RNNs). By relaxing the dependency of Conceptor learning to RNNs, we make Conceptor-CAM not only generalizable to more DNN architectures but also able to learn both the inter- and intra-channel relations for better saliency map generation. Moreover, we have enabled the use of Boolean operations to combine the positive and pseudo-negative evidences, which has made the CAM inference more robust and comprehensive. The effectiveness of Conceptor-CAM has been validated with both formal verifications and experiments on the dataset of the largest scale in literature. The experimental results show that Conceptor-CAM is compatible with and can bring significant improvement to all well recognized CAM-based methods, and has outperformed the state-of-the-art methods by 43.14%~72.79% (88.39%~168.15%) on ILSVRC2012 in Average Increase (Drop), 15.42%~42.55% (47.09%~372.09%) on VOC, and 17.43%~31.32% (47.54%~206.45%) on COCO, respectively.