Zunjie Xiao

CV
h-index16
8papers
98citations
Novelty44%
AI Score44

8 Papers

CVMar 3, 2023
Pyramid Pixel Context Adaption Network for Medical Image Classification with Supervised Contrastive Learning

Xiaoqing Zhang, Zunjie Xiao, Xiao Wu et al.

Spatial attention mechanism has been widely incorporated into deep neural networks (DNNs), significantly lifting the performance in computer vision tasks via long-range dependency modeling. However, it may perform poorly in medical image analysis. Unfortunately, existing efforts are often unaware that long-range dependency modeling has limitations in highlighting subtle lesion regions. To overcome this limitation, we propose a practical yet lightweight architectural unit, Pyramid Pixel Context Adaption (PPCA) module, which exploits multi-scale pixel context information to recalibrate pixel position in a pixel-independent manner dynamically. PPCA first applies a well-designed cross-channel pyramid pooling to aggregate multi-scale pixel context information, then eliminates the inconsistency among them by the well-designed pixel normalization, and finally estimates per pixel attention weight via a pixel context integration. By embedding PPCA into a DNN with negligible overhead, the PPCANet is developed for medical image classification. In addition, we introduce supervised contrastive learning to enhance feature representation by exploiting the potential of label information via supervised contrastive loss. The extensive experiments on six medical image datasets show that PPCANet outperforms state-of-the-art attention-based networks and recent deep neural networks. We also provide visual analysis and ablation study to explain the behavior of PPCANet in the decision-making process.

CVAug 16, 2024
MM-UNet: A Mixed MLP Architecture for Improved Ophthalmic Image Segmentation

Zunjie Xiao, Xiaoqing Zhang, Risa Higashita et al.

Ophthalmic image segmentation serves as a critical foundation for ocular disease diagnosis. Although fully convolutional neural networks (CNNs) are commonly employed for segmentation, they are constrained by inductive biases and face challenges in establishing long-range dependencies. Transformer-based models address these limitations but introduce substantial computational overhead. Recently, a simple yet efficient Multilayer Perceptron (MLP) architecture was proposed for image classification, achieving competitive performance relative to advanced transformers. However, its effectiveness for ophthalmic image segmentation remains unexplored. In this paper, we introduce MM-UNet, an efficient Mixed MLP model tailored for ophthalmic image segmentation. Within MM-UNet, we propose a multi-scale MLP (MMLP) module that facilitates the interaction of features at various depths through a grouping strategy, enabling simultaneous capture of global and local information. We conducted extensive experiments on both a private anterior segment optical coherence tomography (AS-OCT) image dataset and a public fundus image dataset. The results demonstrated the superiority of our MM-UNet model in comparison to state-of-the-art deep segmentation networks.

CVAug 6, 2024
Dual-View Pyramid Pooling in Deep Neural Networks for Improved Medical Image Classification and Confidence Calibration

Xiaoqing Zhang, Qiushi Nie, Zunjie Xiao et al.

Spatial pooling (SP) and cross-channel pooling (CCP) operators have been applied to aggregate spatial features and pixel-wise features from feature maps in deep neural networks (DNNs), respectively. Their main goal is to reduce computation and memory overhead without visibly weakening the performance of DNNs. However, SP often faces the problem of losing the subtle feature representations, while CCP has a high possibility of ignoring salient feature representations, which may lead to both miscalibration of confidence issues and suboptimal medical classification results. To address these problems, we propose a novel dual-view framework, the first to systematically investigate the relative roles of SP and CCP by analyzing the difference between spatial features and pixel-wise features. Based on this framework, we propose a new pooling method, termed dual-view pyramid pooling (DVPP), to aggregate multi-scale dual-view features. DVPP aims to boost both medical image classification and confidence calibration performance by fully leveraging the merits of SP and CCP operators from a dual-axis perspective. Additionally, we discuss how to fulfill DVPP with five parameter-free implementations. Extensive experiments on six 2D/3D medical image classification tasks show that our DVPP surpasses state-of-the-art pooling methods in terms of medical image classification results and confidence calibration across different DNNs.

CVDec 30, 2025
Pathology Context Recalibration Network for Ocular Disease Recognition

Zunjie Xiao, Xiaoqing Zhang, Risa Higashita et al.

Pathology context and expert experience play significant roles in clinical ocular disease diagnosis. Although deep neural networks (DNNs) have good ocular disease recognition results, they often ignore exploring the clinical pathology context and expert experience priors to improve ocular disease recognition performance and decision-making interpretability. To this end, we first develop a novel Pathology Recalibration Module (PRM) to leverage the potential of pathology context prior via the combination of the well-designed pixel-wise context compression operator and pathology distribution concentration operator; then this paper applies a novel expert prior Guidance Adapter (EPGA) to further highlight significant pixel-wise representation regions by fully mining the expert experience prior. By incorporating PRM and EPGA into the modern DNN, the PCRNet is constructed for automated ocular disease recognition. Additionally, we introduce an Integrated Loss (IL) to boost the ocular disease recognition performance of PCRNet by considering the effects of sample-wise loss distributions and training label frequencies. The extensive experiments on three ocular disease datasets demonstrate the superiority of PCRNet with IL over state-of-the-art attention-based networks and advanced loss methods. Further visualization analysis explains the inherent behavior of PRM and EPGA that affects the decision-making process of DNNs.

CVAug 12, 2025Code
Adaptive Confidence-Wise Loss for Improved Lens Structure Segmentation in AS-OCT

Zunjie Xiao, Xiao Wu, Tianhang Liu et al.

Precise lens structure segmentation is essential for the design of intraocular lenses (IOLs) in cataract surgery. Existing deep segmentation networks typically weight all pixels equally under cross-entropy (CE) loss, overlooking the fact that sub-regions of lens structures are inhomogeneous (e.g., some regions perform better than others) and that boundary regions often suffer from poor segmentation calibration at the pixel level. Clinically, experts annotate different sub-regions of lens structures with varying confidence levels, considering factors such as sub-region proportions, ambiguous boundaries, and lens structure shapes. Motivated by this observation, we propose an Adaptive Confidence-Wise (ACW) loss to group each lens structure sub-region into different confidence sub-regions via a confidence threshold from the unique region aspect, aiming to exploit the potential of expert annotation confidence prior. Specifically, ACW clusters each target region into low-confidence and high-confidence groups and then applies a region-weighted loss to reweigh each confidence group. Moreover, we design an adaptive confidence threshold optimization algorithm to adjust the confidence threshold of ACW dynamically. Additionally, to better quantify the miscalibration errors in boundary region segmentation, we propose a new metric, termed Boundary Expected Calibration Error (BECE). Extensive experiments on a clinical lens structure AS-OCT dataset and other multi-structure datasets demonstrate that our ACW significantly outperforms competitive segmentation loss methods across different deep segmentation networks (e.g., MedSAM). Notably, our method surpasses CE with 6.13% IoU gain, 4.33% DSC increase, and 4.79% BECE reduction in lens structure segmentation under U-Net. The code of this paper is available at https://github.com/XiaoLing12138/Adaptive-Confidence-Wise-Loss.

CVMay 19, 2025Code
Expert-Like Reparameterization of Heterogeneous Pyramid Receptive Fields in Efficient CNNs for Fair Medical Image Classification

Xiao Wu, Xiaoqing Zhang, Zunjie Xiao et al.

Efficient convolutional neural network (CNN) architecture design has attracted growing research interests. However, they typically apply single receptive field (RF), small asymmetric RFs, or pyramid RFs to learn different feature representations, still encountering two significant challenges in medical image classification tasks: 1) They have limitations in capturing diverse lesion characteristics efficiently, e.g., tiny, coordination, small and salient, which have unique roles on the classification results, especially imbalanced medical image classification. 2) The predictions generated by those CNNs are often unfair/biased, bringing a high risk when employing them to real-world medical diagnosis conditions. To tackle these issues, we develop a new concept, Expert-Like Reparameterization of Heterogeneous Pyramid Receptive Fields (ERoHPRF), to simultaneously boost medical image classification performance and fairness. This concept aims to mimic the multi-expert consultation mode by applying the well-designed heterogeneous pyramid RF bag to capture lesion characteristics with varying significances effectively via convolution operations with multiple heterogeneous kernel sizes. Additionally, ERoHPRF introduces an expert-like structural reparameterization technique to merge its parameters with the two-stage strategy, ensuring competitive computation cost and inference speed through comparisons to a single RF. To manifest the effectiveness and generalization ability of ERoHPRF, we incorporate it into mainstream efficient CNN architectures. The extensive experiments show that our proposed ERoHPRF maintains a better trade-off than state-of-the-art methods in terms of medical image classification, fairness, and computation overhead. The code of this paper is available at https://github.com/XiaoLing12138/Expert-Like-Reparameterization-of-Heterogeneous-Pyramid-Receptive-Fields.

CYJul 25, 2025
PEMUTA: Pedagogically-Enriched Multi-Granular Undergraduate Thesis Assessment

Jialu Zhang, Qingyang Sun, Qianyi Wang et al.

The undergraduate thesis (UGTE) plays an indispensable role in assessing a student's cumulative academic development throughout their college years. Although large language models (LLMs) have advanced education intelligence, they typically focus on holistic assessment with only one single evaluation score, but ignore the intricate nuances across multifaceted criteria, limiting their ability to reflect structural criteria, pedagogical objectives, and diverse academic competencies. Meanwhile, pedagogical theories have long informed manual UGTE evaluation through multi-dimensional assessment of cognitive development, disciplinary thinking, and academic performance, yet remain underutilized in automated settings. Motivated by the research gap, we pioneer PEMUTA, a pedagogically-enriched framework that effectively activates domain-specific knowledge from LLMs for multi-granular UGTE assessment. Guided by Vygotsky's theory and Bloom's Taxonomy, PEMUTA incorporates a hierarchical prompting scheme that evaluates UGTEs across six fine-grained dimensions: Structure, Logic, Originality, Writing, Proficiency, and Rigor (SLOWPR), followed by holistic synthesis. Two in-context learning techniques, \ie, few-shot prompting and role-play prompting, are also incorporated to further enhance alignment with expert judgments without fine-tuning. We curate a dataset of authentic UGTEs with expert-provided SLOWPR-aligned annotations to support multi-granular UGTE assessment. Extensive experiments demonstrate that PEMUTA achieves strong alignment with expert evaluations, and exhibits strong potential for fine-grained, pedagogically-informed UGTE evaluations.

IVDec 9, 2020
Machine Learning for Cataract Classification and Grading on Ophthalmic Imaging Modalities: A Survey

Xiaoqing Zhang, Yan Hu, Zunjie Xiao et al.

Cataracts are the leading cause of visual impairment and blindness globally. Over the years, researchers have achieved significant progress in developing state-of-the-art machine learning techniques for automatic cataract classification and grading, aiming to prevent cataracts early and improve clinicians' diagnosis efficiency. This survey provides a comprehensive survey of recent advances in machine learning techniques for cataract classification/grading based on ophthalmic images. We summarize existing literature from two research directions: conventional machine learning methods and deep learning methods. This survey also provides insights into existing works of both merits and limitations. In addition, we discuss several challenges of automatic cataract classification/grading based on machine learning techniques and present possible solutions to these challenges for future research.