CVAug 21, 2023Code
LDCSF: Local depth convolution-based Swim framework for classifying multi-label histopathology imagesLiangrui Pan, Yutao Dou, Zhichao Feng et al.
Histopathological images are the gold standard for diagnosing liver cancer. However, the accuracy of fully digital diagnosis in computational pathology needs to be improved. In this paper, in order to solve the problem of multi-label and low classification accuracy of histopathology images, we propose a locally deep convolutional Swim framework (LDCSF) to classify multi-label histopathology images. In order to be able to provide local field of view diagnostic results, we propose the LDCSF model, which consists of a Swin transformer module, a local depth convolution (LDC) module, a feature reconstruction (FR) module, and a ResNet module. The Swin transformer module reduces the amount of computation generated by the attention mechanism by limiting the attention to each window. The LDC then reconstructs the attention map and performs convolution operations in multiple channels, passing the resulting feature map to the next layer. The FR module uses the corresponding weight coefficient vectors obtained from the channels to dot product with the original feature map vector matrix to generate representative feature maps. Finally, the residual network undertakes the final classification task. As a result, the classification accuracy of LDCSF for interstitial area, necrosis, non-tumor and tumor reached 0.9460, 0.9960, 0.9808, 0.9847, respectively. Finally, we use the results of multi-label pathological image classification to calculate the tumor-to-stromal ratio, which lays the foundation for the analysis of the microenvironment of liver cancer histopathological images. Second, we released a multilabel histopathology image of liver cancer, our code and data are available at https://github.com/panliangrui/LSF.
CVMar 25, 2022
MDAN: Multi-level Dependent Attention Network for Visual Emotion AnalysisLiwen Xu, Zhengtao Wang, Bin Wu et al.
Visual Emotion Analysis (VEA) is attracting increasing attention. One of the biggest challenges of VEA is to bridge the affective gap between visual clues in a picture and the emotion expressed by the picture. As the granularity of emotions increases, the affective gap increases as well. Existing deep approaches try to bridge the gap by directly learning discrimination among emotions globally in one shot without considering the hierarchical relationship among emotions at different affective levels and the affective level of emotions to be classified. In this paper, we present the Multi-level Dependent Attention Network (MDAN) with two branches, to leverage the emotion hierarchy and the correlation between different affective levels and semantic levels. The bottom-up branch directly learns emotions at the highest affective level and strictly follows the emotion hierarchy while predicting emotions at lower affective levels. In contrast, the top-down branch attempt to disentangle the affective gap by one-to-one mapping between semantic levels and affective levels, namely, Affective Semantic Mapping. At each semantic level, a local classifier learns discrimination among emotions at the corresponding affective level. Finally, We integrate global learning and local learning into a unified deep framework and optimize the network simultaneously. Moreover, to properly extract and leverage channel dependencies and spatial attention while disentangling the affective gap, we carefully designed two attention modules: the Multi-head Cross Channel Attention module and the Level-dependent Class Activation Map module. Finally, the proposed deep framework obtains new state-of-the-art performance on six VEA benchmarks, where it outperforms existing state-of-the-art methods by a large margin, e.g., +3.85% on the WEBEmo dataset at 25 classes classification accuracy.
CVAug 21, 2023
CVFC: Attention-Based Cross-View Feature Consistency for Weakly Supervised Semantic Segmentation of Pathology ImagesLiangrui Pan, Lian Wang, Zhichao Feng et al.
Histopathology image segmentation is the gold standard for diagnosing cancer, and can indicate cancer prognosis. However, histopathology image segmentation requires high-quality masks, so many studies now use imagelevel labels to achieve pixel-level segmentation to reduce the need for fine-grained annotation. To solve this problem, we propose an attention-based cross-view feature consistency end-to-end pseudo-mask generation framework named CVFC based on the attention mechanism. Specifically, CVFC is a three-branch joint framework composed of two Resnet38 and one Resnet50, and the independent branch multi-scale integrated feature map to generate a class activation map (CAM); in each branch, through down-sampling and The expansion method adjusts the size of the CAM; the middle branch projects the feature matrix to the query and key feature spaces, and generates a feature space perception matrix through the connection layer and inner product to adjust and refine the CAM of each branch; finally, through the feature consistency loss and feature cross loss to optimize the parameters of CVFC in co-training mode. After a large number of experiments, An IoU of 0.7122 and a fwIoU of 0.7018 are obtained on the WSSS4LUAD dataset, which outperforms HistoSegNet, SEAM, C-CAM, WSSS-Tissue, and OEEM, respectively.
CVOct 20, 2022
MGTUNet: An new UNet for colon nuclei instance segmentation and quantificationLiangrui Pan, Lian Wang, Zhichao Feng et al.
Colorectal cancer (CRC) is among the top three malignant tumor types in terms of morbidity and mortality. Histopathological images are the gold standard for diagnosing colon cancer. Cellular nuclei instance segmentation and classification, and nuclear component regression tasks can aid in the analysis of the tumor microenvironment in colon tissue. Traditional methods are still unable to handle both types of tasks end-to-end at the same time, and have poor prediction accuracy and high application costs. This paper proposes a new UNet model for handling nuclei based on the UNet framework, called MGTUNet, which uses Mish, Group normalization and transposed convolution layer to improve the segmentation model, and a ranger optimizer to adjust the SmoothL1Loss values. Secondly, it uses different channels to segment and classify different types of nucleus, ultimately completing the nuclei instance segmentation and classification task, and the nuclei component regression task simultaneously. Finally, we did extensive comparison experiments using eight segmentation models. By comparing the three evaluation metrics and the parameter sizes of the models, MGTUNet obtained 0.6254 on PQ, 0.6359 on mPQ, and 0.8695 on R2. Thus, the experiments demonstrated that MGTUNet is now a state-of-the-art method for quantifying histopathological images of colon cancer.
LGJul 9, 2023
DEDUCE: Multi-head attention decoupled contrastive learning to discover cancer subtypes based on multi-omics dataLiangrui Pan, Xiang Wang, Qingchun Liang et al.
Background and Objective: Given the high heterogeneity and clinical diversity of cancer, substantial variations exist in multi-omics data and clinical features across different cancer subtypes. Methods: We propose a model, named DEDUCE, based on a symmetric multi-head attention encoders (SMAE), for unsupervised contrastive learning to analyze multi-omics cancer data, with the aim of identifying and characterizing cancer subtypes. This model adopts a unsupervised SMAE that can deeply extract contextual features and long-range dependencies from multi-omics data, thereby mitigating the impact of noise. Importantly, DEDUCE introduces a subtype decoupled contrastive learning method based on a multi-head attention mechanism to simultaneously learn features from multi-omics data and perform clustering for identifying cancer subtypes. Subtypes are clustered by calculating the similarity between samples in both the feature space and sample space of multi-omics data. The fundamental concept involves decoupling various attributes of multi-omics data features and learning them as contrasting terms. A contrastive loss function is constructed to quantify the disparity between positive and negative examples, and the model minimizes this difference, thereby promoting the acquisition of enhanced feature representation. Results: The DEDUCE model undergoes extensive experiments on simulated multi-omics datasets, single-cell multi-omics datasets, and cancer multi-omics datasets, outperforming 10 deep learning models. The DEDUCE model outperforms state-of-the-art methods, and ablation experiments demonstrate the effectiveness of each module in the DEDUCE model. Finally, we applied the DEDUCE model to identify six cancer subtypes of AML.
CVMar 14, 2024Code
SELECTOR: Heterogeneous graph network with convolutional masked autoencoder for multimodal robust prediction of cancer survivalLiangrui Pan, Yijun Peng, Yan Li et al.
Accurately predicting the survival rate of cancer patients is crucial for aiding clinicians in planning appropriate treatment, reducing cancer-related medical expenses, and significantly enhancing patients' quality of life. Multimodal prediction of cancer patient survival offers a more comprehensive and precise approach. However, existing methods still grapple with challenges related to missing multimodal data and information interaction within modalities. This paper introduces SELECTOR, a heterogeneous graph-aware network based on convolutional mask encoders for robust multimodal prediction of cancer patient survival. SELECTOR comprises feature edge reconstruction, convolutional mask encoder, feature cross-fusion, and multimodal survival prediction modules. Initially, we construct a multimodal heterogeneous graph and employ the meta-path method for feature edge reconstruction, ensuring comprehensive incorporation of feature information from graph edges and effective embedding of nodes. To mitigate the impact of missing features within the modality on prediction accuracy, we devised a convolutional masked autoencoder (CMAE) to process the heterogeneous graph post-feature reconstruction. Subsequently, the feature cross-fusion module facilitates communication between modalities, ensuring that output features encompass all features of the modality and relevant information from other modalities. Extensive experiments and analysis on six cancer datasets from TCGA demonstrate that our method significantly outperforms state-of-the-art methods in both modality-missing and intra-modality information-confirmed cases. Our codes are made available at https://github.com/panliangrui/Selector.
CVMay 13, 2024
FORESEE: Multimodal and Multi-view Representation Learning for Robust Prediction of Cancer SurvivalLiangrui Pan, Yijun Peng, Yan Li et al.
Integrating the different data modalities of cancer patients can significantly improve the predictive performance of patient survival. However, most existing methods ignore the simultaneous utilization of rich semantic features at different scales in pathology images. When collecting multimodal data and extracting features, there is a likelihood of encountering intra-modality missing data, introducing noise into the multimodal data. To address these challenges, this paper proposes a new end-to-end framework, FORESEE, for robustly predicting patient survival by mining multimodal information. Specifically, the cross-fusion transformer effectively utilizes features at the cellular level, tissue level, and tumor heterogeneity level to correlate prognosis through a cross-scale feature cross-fusion method. This enhances the ability of pathological image feature representation. Secondly, the hybrid attention encoder (HAE) uses the denoising contextual attention module to obtain the contextual relationship features and local detail features of the molecular data. HAE's channel attention module obtains global features of molecular data. Furthermore, to address the issue of missing information within modalities, we propose an asymmetrically masked triplet masked autoencoder to reconstruct lost information within modalities. Extensive experiments demonstrate the superiority of our method over state-of-the-art methods on four benchmark datasets in both complete and missing settings.
ROAug 30, 2025
FLUID: A Fine-Grained Lightweight Urban Signalized-Intersection Dataset of Dense Conflict TrajectoriesYiyang Chen, Zhigang Wu, Guohong Zheng et al.
The trajectory data of traffic participants (TPs) is a fundamental resource for evaluating traffic conditions and optimizing policies, especially at urban intersections. Although data acquisition using drones is efficient, existing datasets still have limitations in scene representativeness, information richness, and data fidelity. This study introduces FLUID, comprising a fine-grained trajectory dataset that captures dense conflicts at typical urban signalized intersections, and a lightweight, full-pipeline framework for drone-based trajectory processing. FLUID covers three distinct intersection types, with approximately 5 hours of recording time and featuring over 20,000 TPs across 8 categories. Notably, the dataset averages two vehicle conflicts per minute, involving roughly 25% of all motor vehicles. FLUID provides comprehensive data, including trajectories, traffic signals, maps, and raw videos. Comparison with the DataFromSky platform and ground-truth measurements validates its high spatio-temporal accuracy. Through a detailed classification of motor vehicle conflicts and violations, FLUID reveals a diversity of interactive behaviors, demonstrating its value for human preference mining, traffic behavior modeling, and autonomous driving research.