Chengfeng Zhou

h-index5

8papers

23citations

Novelty48%

AI Score31

Ranked #130,653 of 194,257 authors (top 67%)#43,196 in CV (top 73%)

8 Papers

5.0CVApr 19, 2023Code

Learning Robust Visual-Semantic Embedding for Generalizable Person Re-identification

Suncheng Xiang, Jingsheng Gao, Mengyuan Guan et al.

Generalizable person re-identification (Re-ID) is a very hot research topic in machine learning and computer vision, which plays a significant role in realistic scenarios due to its various applications in public security and video surveillance. However, previous methods mainly focus on the visual representation learning, while neglect to explore the potential of semantic features during training, which easily leads to poor generalization capability when adapted to the new domain. In this paper, we propose a Multi-Modal Equivalent Transformer called MMET for more robust visual-semantic embedding learning on visual, textual and visual-textual tasks respectively. To further enhance the robust feature learning in the context of transformer, a dynamic masking mechanism called Masked Multimodal Modeling strategy (MMM) is introduced to mask both the image patches and the text tokens, which can jointly works on multimodal or unimodal data and significantly boost the performance of generalizable person Re-ID. Extensive experiments on benchmark datasets demonstrate the competitive performance of our method over previous approaches. We hope this method could advance the research towards visual-semantic representation learning. Our source code is also publicly available at https://github.com/JeremyXSC/MMET.

2.8CVAug 2, 2023Code

Colo-ReID: Discriminative Representation Embedding with Meta-learning for Colonoscopic Polyp Re-Identification

Suncheng Xiang, Chengfeng Zhou, Zhengjie Zhang et al.

Colonoscopic Polyp Re-Identification aims to match the same polyp from a large gallery with images from different views taken using different cameras and plays an important role in the prevention and treatment of colorectal cancer. However, traditional methods for object ReID directly adopting CNN models trained on the ImageNet dataset usually produce unsatisfactory retrieval performance on colonoscopic datasets due to the large domain gap. Additionally, these methods neglect to explore the potential of self-discrepancy among intra-class or inter-class relations in the colonoscopic polyp dataset, which remains an open research problem in the medical community. To solve this dilemma, we propose a simple but effective training method named Colo-ReID, which can help our model learn more general and discriminative knowledge based on the meta-learning strategy in scenarios with fewer samples. Based on this, a dynamic Meta-Learning Regulation mechanism called MLR is introduced to further boost the performance of polyp re-identification. Our experimental results show that Colo-ReID consistently outperforms second-best method in terms of mAP performance by +2.3% on polyp re-identification task. Our source code is also publicly available at https://github.com/JeremyXSC/Colo-ReID.

1.4CVJul 7, 2022Code

A simple normalization technique using window statistics to improve the out-of-distribution generalization on medical images

Chengfeng Zhou, Songchang Chen, Chenming Xu et al.

Since data scarcity and data heterogeneity are prevailing for medical images, well-trained Convolutional Neural Networks (CNNs) using previous normalization methods may perform poorly when deployed to a new site. However, a reliable model for real-world clinical applications should be able to generalize well both on in-distribution (IND) and out-of-distribution (OOD) data (e.g., the new site data). In this study, we present a novel normalization technique called window normalization (WIN) to improve the model generalization on heterogeneous medical images, which is a simple yet effective alternative to existing normalization methods. Specifically, WIN perturbs the normalizing statistics with the local statistics computed on the window of features. This feature-level augmentation technique regularizes the models well and improves their OOD generalization significantly. Taking its advantage, we propose a novel self-distillation method called WIN-WIN for classification tasks. WIN-WIN is easily implemented with twice forward passes and a consistency constraint, which can be a simple extension for existing methods. Extensive experimental results on various tasks (6 tasks) and datasets (24 datasets) demonstrate the generality and effectiveness of our methods.

3.7CVSep 23, 2024

Generalizing monocular colonoscopy image depth estimation by uncertainty-based global and local fusion network

Sijia Du, Chengfeng Zhou, Suncheng Xiang et al.

Objective: Depth estimation is crucial for endoscopic navigation and manipulation, but obtaining ground-truth depth maps in real clinical scenarios, such as the colon, is challenging. This study aims to develop a robust framework that generalizes well to real colonoscopy images, overcoming challenges like non-Lambertian surface reflection and diverse data distributions. Methods: We propose a framework combining a convolutional neural network (CNN) for capturing local features and a Transformer for capturing global information. An uncertainty-based fusion block was designed to enhance generalization by identifying complementary contributions from the CNN and Transformer branches. The network can be trained with simulated datasets and generalize directly to unseen clinical data without any fine-tuning. Results: Our method is validated on multiple datasets and demonstrates an excellent generalization ability across various datasets and anatomical structures. Furthermore, qualitative analysis in real clinical scenarios confirmed the robustness of the proposed method. Conclusion: The integration of local and global features through the CNN-Transformer architecture, along with the uncertainty-based fusion block, improves depth estimation performance and generalization in both simulated and real-world endoscopic environments. Significance: This study offers a novel approach to estimate depth maps for endoscopy images despite the complex conditions in clinic, serving as a foundation for endoscopic automatic navigation and other clinical tasks, such as polyp detection and segmentation.

2.0CVJul 29, 2024

Towards Open-Set Myoelectric Gesture Recognition via Dual-Perspective Inconsistency Learning

Chen Liu, Can Han, Chengfeng Zhou et al.

Gesture recognition based on surface electromyography (sEMG) has achieved significant progress in human-machine interaction (HMI), especially in prosthetic control and movement rehabilitation. However, accurately recognizing predefined gestures within a closed set is still inadequate in practice; a robust open-set system needs to effectively reject unknown gestures while correctly classifying known ones, which is rarely explored in the field of myoelectric gesture recognition. To handle this challenge, we first report a significant distinction in prediction inconsistency discovered for unknown classes, which arises from different perspectives and can substantially enhance open-set recognition performance. Based on this insight, we propose a novel dual-perspective inconsistency learning approach, PredIN, to explicitly magnify the prediction inconsistency by enhancing the inconsistency of class feature distribution within different perspectives. Specifically, PredIN maximizes the class feature distribution inconsistency among the dual perspectives to enhance their differences. Meanwhile, it optimizes inter-class separability within an individual perspective to maintain individual performance. Comprehensive experiments on various benchmark datasets demonstrate that the PredIN outperforms state-of-the-art methods by a clear margin. Our proposed method simultaneously achieves accurate closed-set classification for predefined gestures and effective rejection for unknown gestures, exhibiting its efficacy and superiority in open-set gesture recognition based on sEMG.

1.5CVDec 5, 2023

Towards Open-set Gesture Recognition via Feature Activation Enhancement and Orthogonal Prototype Learning

Chen Liu, Can Han, Chengfeng Zhou et al.

Gesture recognition is a foundational task in human-machine interaction (HMI). While there has been significant progress in gesture recognition based on surface electromyography (sEMG), accurate recognition of predefined gestures only within a closed set is still inadequate in practice. It is essential to effectively discern and reject unknown gestures of disinterest in a robust system. Numerous methods based on prototype learning (PL) have been proposed to tackle this open set recognition (OSR) problem. However, they do not fully explore the inherent distinctions between known and unknown classes. In this paper, we propose a more effective PL method leveraging two novel and inherent distinctions, feature activation level and projection inconsistency. Specifically, the Feature Activation Enhancement Mechanism (FAEM) widens the gap in feature activation values between known and unknown classes. Furthermore, we introduce Orthogonal Prototype Learning (OPL) to construct multiple perspectives. OPL acts to project a sample from orthogonal directions to maximize the distinction between its two projections, where unknown samples will be projected near the clusters of different known classes while known samples still maintain intra-class similarity. Our proposed method simultaneously achieves accurate closed-set classification for predefined gestures and effective rejection for unknown gestures. Extensive experiments demonstrate its efficacy and superiority in open-set gesture recognition based on sEMG.

4.9CLFeb 3, 2025

OphthBench: A Comprehensive Benchmark for Evaluating Large Language Models in Chinese Ophthalmology

Chengfeng Zhou, Ji Wang, Juanjuan Qin et al.

Large language models (LLMs) have shown significant promise across various medical applications, with ophthalmology being a notable area of focus. Many ophthalmic tasks have shown substantial improvement through the integration of LLMs. However, before these models can be widely adopted in clinical practice, evaluating their capabilities and identifying their limitations is crucial. To address this research gap and support the real-world application of LLMs, we introduce the OphthBench, a specialized benchmark designed to assess LLM performance within the context of Chinese ophthalmic practices. This benchmark systematically divides a typical ophthalmic clinical workflow into five key scenarios: Education, Triage, Diagnosis, Treatment, and Prognosis. For each scenario, we developed multiple tasks featuring diverse question types, resulting in a comprehensive benchmark comprising 9 tasks and 591 questions. This comprehensive framework allows for a thorough assessment of LLMs' capabilities and provides insights into their practical application in Chinese ophthalmology. Using this benchmark, we conducted extensive experiments and analyzed the results from 39 popular LLMs. Our evaluation highlights the current gap between LLM development and its practical utility in clinical settings, providing a clear direction for future advancements. By bridging this gap, we aim to unlock the potential of LLMs and advance their development in ophthalmology.

2.0CVJan 18, 2024

Skeleton-Guided Instance Separation for Fine-Grained Segmentation in Microscopy

Jun Wang, Chengfeng Zhou, Zhaoyan Ming et al.

One of the fundamental challenges in microscopy (MS) image analysis is instance segmentation (IS), particularly when segmenting cluster regions where multiple objects of varying sizes and shapes may be connected or even overlapped in arbitrary orientations. Existing IS methods usually fail in handling such scenarios, as they rely on coarse instance representations such as keypoints and horizontal bounding boxes (h-bboxes). In this paper, we propose a novel one-stage framework named A2B-IS to address this challenge and enhance the accuracy of IS in MS images. Our approach represents each instance with a pixel-level mask map and a rotated bounding box (r-bbox). Unlike two-stage methods that use box proposals for segmentations, our method decouples mask and box predictions, enabling simultaneous processing to streamline the model pipeline. Additionally, we introduce a Gaussian skeleton map to aid the IS task in two key ways: (1) It guides anchor placement, reducing computational costs while improving the model's capacity to learn RoI-aware features by filtering out noise from background regions. (2) It ensures accurate isolation of densely packed instances by rectifying erroneous box predictions near instance boundaries. To further enhance the performance, we integrate two modules into the framework: (1) An Atrous Attention Block (A2B) designed to extract high-resolution feature maps with fine-grained multiscale information, and (2) A Semi-Supervised Learning (SSL) strategy that leverages both labeled and unlabeled images for model training. Our method has been thoroughly validated on two large-scale MS datasets, demonstrating its superiority over most state-of-the-art approaches.