Shaoliang Peng

CV
h-index11
23papers
241citations
Novelty39%
AI Score51

23 Papers

CVAug 21, 2023Code
LDCSF: Local depth convolution-based Swim framework for classifying multi-label histopathology images

Liangrui Pan, Yutao Dou, Zhichao Feng et al.

Histopathological images are the gold standard for diagnosing liver cancer. However, the accuracy of fully digital diagnosis in computational pathology needs to be improved. In this paper, in order to solve the problem of multi-label and low classification accuracy of histopathology images, we propose a locally deep convolutional Swim framework (LDCSF) to classify multi-label histopathology images. In order to be able to provide local field of view diagnostic results, we propose the LDCSF model, which consists of a Swin transformer module, a local depth convolution (LDC) module, a feature reconstruction (FR) module, and a ResNet module. The Swin transformer module reduces the amount of computation generated by the attention mechanism by limiting the attention to each window. The LDC then reconstructs the attention map and performs convolution operations in multiple channels, passing the resulting feature map to the next layer. The FR module uses the corresponding weight coefficient vectors obtained from the channels to dot product with the original feature map vector matrix to generate representative feature maps. Finally, the residual network undertakes the final classification task. As a result, the classification accuracy of LDCSF for interstitial area, necrosis, non-tumor and tumor reached 0.9460, 0.9960, 0.9808, 0.9847, respectively. Finally, we use the results of multi-label pathological image classification to calculate the tumor-to-stromal ratio, which lays the foundation for the analysis of the microenvironment of liver cancer histopathological images. Second, we released a multilabel histopathology image of liver cancer, our code and data are available at https://github.com/panliangrui/LSF.

SEMar 28, 2023
One Adapter for All Programming Languages? Adapter Tuning for Code Search and Summarization

Deze Wang, Boxing Chen, Shanshan Li et al.

As pre-trained models automate many code intelligence tasks, a widely used paradigm is to fine-tune a model on the task dataset for each programming language. A recent study reported that multilingual fine-tuning benefits a range of tasks and models. However, we find that multilingual fine-tuning leads to performance degradation on recent models UniXcoder and CodeT5. To alleviate the potentially catastrophic forgetting issue in multilingual models, we fix all pre-trained model parameters, insert the parameter-efficient structure adapter, and fine-tune it. Updating only 0.6\% of the overall parameters compared to full-model fine-tuning for each programming language, adapter tuning yields consistent improvements on code search and summarization tasks, achieving state-of-the-art results. In addition, we experimentally show its effectiveness in cross-lingual and low-resource scenarios. Multilingual fine-tuning with 200 samples per programming language approaches the results fine-tuned with the entire dataset on code summarization. Our experiments on three probing tasks show that adapter tuning significantly outperforms full-model fine-tuning and effectively overcomes catastrophic forgetting.

IVApr 29, 2022
Noise-reducing attention cross fusion learning transformer for histological image classification of osteosarcoma

Liangrui Pan, Hetian Wang, Lian Wang et al.

The degree of malignancy of osteosarcoma and its tendency to metastasize/spread mainly depend on the pathological grade (determined by observing the morphology of the tumor under a microscope). The purpose of this study is to use artificial intelligence to classify osteosarcoma histological images and to assess tumor survival and necrosis, which will help doctors reduce their workload, improve the accuracy of osteosarcoma cancer detection, and make a better prognosis for patients. The study proposes a typical transformer image classification framework by integrating noise reduction convolutional autoencoder and feature cross fusion learning (NRCA-FCFL) to classify osteosarcoma histological images. Noise reduction convolutional autoencoder could well denoise histological images of osteosarcoma, resulting in more pure images for osteosarcoma classification. Moreover, we introduce feature cross fusion learning, which integrates two scale image patches, to sufficiently explore their interactions by using additional classification tokens. As a result, a refined fusion feature is generated, which is fed to the residual neural network for label predictions. We conduct extensive experiments to evaluate the performance of the proposed approach. The experimental results demonstrate that our method outperforms the traditional and deep learning approaches on various evaluation metrics, with an accuracy of 99.17% to support osteosarcoma diagnosis.

IVMay 31, 2022
A review of machine learning approaches, challenges and prospects for computational tumor pathology

Liangrui Pan, Zhichao Feng, Shaoliang Peng

Computational pathology is part of precision oncology medicine. The integration of high-throughput data including genomics, transcriptomics, proteomics, metabolomics, pathomics, and radiomics into clinical practice improves cancer treatment plans, treatment cycles, and cure rates, and helps doctors open up innovative approaches to patient prognosis. In the past decade, rapid advances in artificial intelligence, chip design and manufacturing, and mobile computing have facilitated research in computational pathology and have the potential to provide better-integrated solutions for whole-slide images, multi-omics data, and clinical informatics. However, tumor computational pathology now brings some challenges to the application of tumour screening, diagnosis and prognosis in terms of data integration, hardware processing, network sharing bandwidth and machine learning technology. This review investigates image preprocessing methods in computational pathology from a pathological and technical perspective, machine learning-based methods, and applications of computational pathology in breast, colon, prostate, lung, and various tumour disease scenarios. Finally, the challenges and prospects of machine learning in computational pathology applications are discussed.

CVAug 21, 2023
CVFC: Attention-Based Cross-View Feature Consistency for Weakly Supervised Semantic Segmentation of Pathology Images

Liangrui Pan, Lian Wang, Zhichao Feng et al.

Histopathology image segmentation is the gold standard for diagnosing cancer, and can indicate cancer prognosis. However, histopathology image segmentation requires high-quality masks, so many studies now use imagelevel labels to achieve pixel-level segmentation to reduce the need for fine-grained annotation. To solve this problem, we propose an attention-based cross-view feature consistency end-to-end pseudo-mask generation framework named CVFC based on the attention mechanism. Specifically, CVFC is a three-branch joint framework composed of two Resnet38 and one Resnet50, and the independent branch multi-scale integrated feature map to generate a class activation map (CAM); in each branch, through down-sampling and The expansion method adjusts the size of the CAM; the middle branch projects the feature matrix to the query and key feature spaces, and generates a feature space perception matrix through the connection layer and inner product to adjust and refine the CAM of each branch; finally, through the feature consistency loss and feature cross loss to optimize the parameters of CVFC in co-training mode. After a large number of experiments, An IoU of 0.7122 and a fwIoU of 0.7018 are obtained on the WSSS4LUAD dataset, which outperforms HistoSegNet, SEAM, C-CAM, WSSS-Tissue, and OEEM, respectively.

CVOct 20, 2022
MGTUNet: An new UNet for colon nuclei instance segmentation and quantification

Liangrui Pan, Lian Wang, Zhichao Feng et al.

Colorectal cancer (CRC) is among the top three malignant tumor types in terms of morbidity and mortality. Histopathological images are the gold standard for diagnosing colon cancer. Cellular nuclei instance segmentation and classification, and nuclear component regression tasks can aid in the analysis of the tumor microenvironment in colon tissue. Traditional methods are still unable to handle both types of tasks end-to-end at the same time, and have poor prediction accuracy and high application costs. This paper proposes a new UNet model for handling nuclei based on the UNet framework, called MGTUNet, which uses Mish, Group normalization and transposed convolution layer to improve the segmentation model, and a ranger optimizer to adjust the SmoothL1Loss values. Secondly, it uses different channels to segment and classify different types of nucleus, ultimately completing the nuclei instance segmentation and classification task, and the nuclei component regression task simultaneously. Finally, we did extensive comparison experiments using eight segmentation models. By comparing the three evaluation metrics and the parameter sizes of the models, MGTUNet obtained 0.6254 on PQ, 0.6359 on mPQ, and 0.8695 on R2. Thus, the experiments demonstrated that MGTUNet is now a state-of-the-art method for quantifying histopathological images of colon cancer.

QMAug 21, 2023
PACS: Prediction and analysis of cancer subtypes from multi-omics data based on a multi-head attention mechanism model

Liangrui Pan, Dazheng Liu, Zhichao Feng et al.

Due to the high heterogeneity and clinical characteristics of cancer, there are significant differences in multi-omic data and clinical characteristics among different cancer subtypes. Therefore, accurate classification of cancer subtypes can help doctors choose the most appropriate treatment options, improve treatment outcomes, and provide more accurate patient survival predictions. In this study, we propose a supervised multi-head attention mechanism model (SMA) to classify cancer subtypes successfully. The attention mechanism and feature sharing module of the SMA model can successfully learn the global and local feature information of multi-omics data. Second, it enriches the parameters of the model by deeply fusing multi-head attention encoders from Siamese through the fusion module. Validated by extensive experiments, the SMA model achieves the highest accuracy, F1 macroscopic, F1 weighted, and accurate classification of cancer subtypes in simulated, single-cell, and cancer multiomics datasets compared to AE, CNN, and GNN-based models. Therefore, we contribute to future research on multiomics data using our attention-based approach.

LGJul 9, 2023
DEDUCE: Multi-head attention decoupled contrastive learning to discover cancer subtypes based on multi-omics data

Liangrui Pan, Xiang Wang, Qingchun Liang et al.

Background and Objective: Given the high heterogeneity and clinical diversity of cancer, substantial variations exist in multi-omics data and clinical features across different cancer subtypes. Methods: We propose a model, named DEDUCE, based on a symmetric multi-head attention encoders (SMAE), for unsupervised contrastive learning to analyze multi-omics cancer data, with the aim of identifying and characterizing cancer subtypes. This model adopts a unsupervised SMAE that can deeply extract contextual features and long-range dependencies from multi-omics data, thereby mitigating the impact of noise. Importantly, DEDUCE introduces a subtype decoupled contrastive learning method based on a multi-head attention mechanism to simultaneously learn features from multi-omics data and perform clustering for identifying cancer subtypes. Subtypes are clustered by calculating the similarity between samples in both the feature space and sample space of multi-omics data. The fundamental concept involves decoupling various attributes of multi-omics data features and learning them as contrasting terms. A contrastive loss function is constructed to quantify the disparity between positive and negative examples, and the model minimizes this difference, thereby promoting the acquisition of enhanced feature representation. Results: The DEDUCE model undergoes extensive experiments on simulated multi-omics datasets, single-cell multi-omics datasets, and cancer multi-omics datasets, outperforming 10 deep learning models. The DEDUCE model outperforms state-of-the-art methods, and ablation experiments demonstrate the effectiveness of each module in the DEDUCE model. Finally, we applied the DEDUCE model to identify six cancer subtypes of AML.

11.9CVMay 15
Diffusion Attention Expert Model for Predicting and Semi-automatic Localizing STAS in Lung Cancer Histopathological Images

Liangrui Pan, Jiadi Luo, Yuxuan Xiao et al.

Accurate intraoperative and postoperative diagnosis of spread through air spaces (STAS) is essential for guiding surgical decisions and postoperative management in lung cancer. However, histopathological assessment is labor-intensive and is prone to missed or incorrect diagnoses. We propose a Diffusion Attention Expert Model (DAEM) to detect STAS in frozen sections (FSs) and paraffin sections (PSs). Its diffusion attention expert module leverages full attention aggregation to learn multi-scale features from histopathological images, while a dual-branch architecture strengthens multi-scale feature representation. On an internal dataset, DAEM achieves AUCs of 0.8946 for FSs and 0.9112 for PSs. Validation on external multi-center datasets from eight institutions demonstrates strong generalizability and interpretability. Using tumor microenvironment (TME) features in PSs, we further enable semi-automatic measurement of STAS location and its distance from the primary tumor. Several quantitative TME metrics are identified as potential biomarkers for STAS, including micropapillary-type STAS. Overall, DAEM offers a clinically actionable framework for STAS assessment by enabling accurate and interpretable detection on FSs and PSs, supporting postoperative risk stratification through quantitative TME-based analysis.

LGSep 30, 2025Code
Annotation-Efficient Active Test-Time Adaptation with Conformal Prediction

Tingyu Shi, Fan Lyu, Shaoliang Peng

Active Test-Time Adaptation (ATTA) improves model robustness under domain shift by selectively querying human annotations at deployment, but existing methods use heuristic uncertainty measures and suffer from low data selection efficiency, wasting human annotation budget. We propose Conformal Prediction Active TTA (CPATTA), which first brings principled, coverage-guaranteed uncertainty into ATTA. CPATTA employs smoothed conformal scores with a top-K certainty measure, an online weight-update algorithm driven by pseudo coverage, a domain-shift detector that adapts human supervision, and a staged update scheme balances human-labeled and model-labeled data. Extensive experiments demonstrate that CPATTA consistently outperforms the state-of-the-art ATTA methods by around 5% in accuracy. Our code and datasets are available at https://github.com/tingyushi/CPATTA.

IVMay 30, 2025Code
DLiPath: A Benchmark for the Comprehensive Assessment of Donor Liver Based on Histopathological Image Dataset

Liangrui Pan, Xingchen Li, Zhongyi Chen et al.

Pathologists comprehensive evaluation of donor liver biopsies provides crucial information for accepting or discarding potential grafts. However, rapidly and accurately obtaining these assessments intraoperatively poses a significant challenge for pathologists. Features in donor liver biopsies, such as portal tract fibrosis, total steatosis, macrovesicular steatosis, and hepatocellular ballooning are correlated with transplant outcomes, yet quantifying these indicators suffers from substantial inter- and intra-observer variability. To address this, we introduce DLiPath, the first benchmark for comprehensive donor liver assessment based on a histopathology image dataset. We collected and publicly released 636 whole slide images from 304 donor liver patients at the Department of Pathology, the Third Xiangya Hospital, with expert annotations for key pathological features (including cholestasis, portal tract fibrosis, portal inflammation, total steatosis, macrovesicular steatosis, and hepatocellular ballooning). We selected nine state-of-the-art multiple-instance learning (MIL) models based on the DLiPath dataset as baselines for extensive comparative analysis. The experimental results demonstrate that several MIL models achieve high accuracy across donor liver assessment indicators on DLiPath, charting a clear course for future automated and intelligent donor liver assessment research. Data and code are available at https://github.com/panliangrui/ACM_MM_2025.

GNMay 30, 2025Code
PathGene: Benchmarking Driver Gene Mutations and Exon Prediction Using Multicenter Lung Cancer Histopathology Image Dataset

Liangrui Pan, Qingchun Liang, Shen Zhao et al.

Accurately predicting gene mutations, mutation subtypes and their exons in lung cancer is critical for personalized treatment planning and prognostic assessment. Faced with regional disparities in medical resources and the high cost of genomic assays, using artificial intelligence to infer these mutations and exon variants from routine histopathology images could greatly facilitate precision therapy. Although some prior studies have shown that deep learning can accelerate the prediction of key gene mutations from lung cancer pathology slides, their performance remains suboptimal and has so far been limited mainly to early screening tasks. To address these limitations, we have assembled PathGene, which comprises histopathology images paired with next-generation sequencing reports from 1,576 patients at the Second Xiangya Hospital, Central South University, and 448 TCGA-LUAD patients. This multi-center dataset links whole-slide images to driver gene mutation status, mutation subtypes, exon, and tumor mutational burden (TMB) status, with the goal of leveraging pathology images to predict mutations, subtypes, exon locations, and TMB for early genetic screening and to advance precision oncology. Unlike existing datasets, we provide molecular-level information related to histopathology images in PathGene to facilitate the development of biomarker prediction models. We benchmarked 11 multiple-instance learning methods on PathGene for mutation, subtype, exon, and TMB prediction tasks. These experimental methods provide valuable alternatives for early genetic screening of lung cancer patients and assisting clinicians to quickly develop personalized precision targeted treatment plans for patients. Code and data are available at https://github.com/panliangrui/NIPS2025/.

CVMar 14, 2024Code
SELECTOR: Heterogeneous graph network with convolutional masked autoencoder for multimodal robust prediction of cancer survival

Liangrui Pan, Yijun Peng, Yan Li et al.

Accurately predicting the survival rate of cancer patients is crucial for aiding clinicians in planning appropriate treatment, reducing cancer-related medical expenses, and significantly enhancing patients' quality of life. Multimodal prediction of cancer patient survival offers a more comprehensive and precise approach. However, existing methods still grapple with challenges related to missing multimodal data and information interaction within modalities. This paper introduces SELECTOR, a heterogeneous graph-aware network based on convolutional mask encoders for robust multimodal prediction of cancer patient survival. SELECTOR comprises feature edge reconstruction, convolutional mask encoder, feature cross-fusion, and multimodal survival prediction modules. Initially, we construct a multimodal heterogeneous graph and employ the meta-path method for feature edge reconstruction, ensuring comprehensive incorporation of feature information from graph edges and effective embedding of nodes. To mitigate the impact of missing features within the modality on prediction accuracy, we devised a convolutional masked autoencoder (CMAE) to process the heterogeneous graph post-feature reconstruction. Subsequently, the feature cross-fusion module facilitates communication between modalities, ensuring that output features encompass all features of the modality and relevant information from other modalities. Extensive experiments and analysis on six cancer datasets from TCGA demonstrate that our method significantly outperforms state-of-the-art methods in both modality-missing and intra-modality information-confirmed cases. Our codes are made available at https://github.com/panliangrui/Selector.

IVAug 22, 2021Code
FEDI: Few-shot learning based on Earth Mover's Distance algorithm combined with deep residual network to identify diabetic retinopathy

Liangrui Pan, Boya Ji, Peng Xi et al.

Diabetic retinopathy(DR) is the main cause of blindness in diabetic patients. However, DR can easily delay the occurrence of blindness through the diagnosis of the fundus. In view of the reality, it is difficult to collect a large amount of diabetic retina data in clinical practice. This paper proposes a few-shot learning model of a deep residual network based on Earth Mover's Distance algorithm to assist in diagnosing DR. We build training and validation classification tasks for few-shot learning based on 39 categories of 1000 sample data, train deep residual networks, and obtain experience maximization pre-training models. Based on the weights of the pre-trained model, the Earth Mover's Distance algorithm calculates the distance between the images, obtains the similarity between the images, and changes the model's parameters to improve the accuracy of the training model. Finally, the experimental construction of the small sample classification task of the test set to optimize the model further, and finally, an accuracy of 93.5667% on the 3way10shot task of the diabetic retina test set. For the experimental code and results, please refer to: https://github.com/panliangrui/few-shot-learning-funds.

LGJul 20, 2021Code
Heterogeneous network-based drug repurposing for COVID-19

Shuting Jin, Xiangxiang Zeng, Wei Huang et al.

The Corona Virus Disease 2019 (COVID-19) belongs to human coronaviruses (HCoVs), which spreads rapidly around the world. Compared with new drug development, drug repurposing may be the best shortcut for treating COVID-19. Therefore, we constructed a comprehensive heterogeneous network based on the HCoVs-related target proteins and use the previously proposed deepDTnet, to discover potential drug candidates for COVID-19. We obtain high performance in predicting the possible drugs effective for COVID-19 related proteins. In summary, this work utilizes a powerful heterogeneous network-based deep learning method, which may be beneficial to quickly identify candidate repurposable drugs toward future clinical trials for COVID-19. The code and data are available at https://github.com/stjin-XMU/HnDR-COVID.

CVMar 24, 2024
Opportunities and challenges in the application of large artificial intelligence models in radiology

Liangrui Pan, Zhenyu Zhao, Ying Lu et al.

Influenced by ChatGPT, artificial intelligence (AI) large models have witnessed a global upsurge in large model research and development. As people enjoy the convenience by this AI large model, more and more large models in subdivided fields are gradually being proposed, especially large models in radiology imaging field. This article first introduces the development history of large models, technical details, workflow, working principles of multimodal large models and working principles of video generation large models. Secondly, we summarize the latest research progress of AI large models in radiology education, radiology report generation, applications of unimodal and multimodal radiology. Finally, this paper also summarizes some of the challenges of large AI models in radiology, with the aim of better promoting the rapid revolution in the field of radiography.

IVNov 22, 2024
Feature-interactive Siamese graph encoder-based image analysis to predict STAS from histopathology images in lung cancer

Liangrui Pan, Qingchun Liang, Wenwu Zeng et al.

Spread through air spaces (STAS) is a distinct invasion pattern in lung cancer, crucial for prognosis assessment and guiding surgical decisions. Histopathology is the gold standard for STAS detection, yet traditional methods are subjective, time-consuming, and prone to misdiagnosis, limiting large-scale applications. We present VERN, an image analysis model utilizing a feature-interactive Siamese graph encoder to predict STAS from lung cancer histopathological images. VERN captures spatial topological features with feature sharing and skip connections to enhance model training. Using 1,546 histopathology slides, we built a large single-cohort STAS lung cancer dataset. VERN achieved an AUC of 0.9215 in internal validation and AUCs of 0.8275 and 0.8829 in frozen and paraffin-embedded test sections, respectively, demonstrating clinical-grade performance. Validated on a single-cohort and three external datasets, VERN showed robust predictive performance and generalizability, providing an open platform (http://plr.20210706.xyz:5000/) to enhance STAS diagnosis efficiency and accuracy.

CVNov 7, 2024
FedDP: Privacy-preserving method based on federated learning for histopathology image segmentation

Liangrui Pan, Mao Huang, Lian Wang et al.

Hematoxylin and Eosin (H&E) staining of whole slide images (WSIs) is considered the gold standard for pathologists and medical practitioners for tumor diagnosis, surgical planning, and post-operative assessment. With the rapid advancement of deep learning technologies, the development of numerous models based on convolutional neural networks and transformer-based models has been applied to the precise segmentation of WSIs. However, due to privacy regulations and the need to protect patient confidentiality, centralized storage and processing of image data are impractical. Training a centralized model directly is challenging to implement in medical settings due to these privacy concerns.This paper addresses the dispersed nature and privacy sensitivity of medical image data by employing a federated learning framework, allowing medical institutions to collaboratively learn while protecting patient privacy. Additionally, to address the issue of original data reconstruction through gradient inversion during the federated learning training process, differential privacy introduces noise into the model updates, preventing attackers from inferring the contributions of individual samples, thereby protecting the privacy of the training data.Experimental results show that the proposed method, FedDP, minimally impacts model accuracy while effectively safeguarding the privacy of cancer pathology image data, with only a slight decrease in Dice, Jaccard, and Acc indices by 0.55%, 0.63%, and 0.42%, respectively. This approach facilitates cross-institutional collaboration and knowledge sharing while protecting sensitive data privacy, providing a viable solution for further research and application in the medical field.

CVAug 11, 2025
Spatial-ORMLLM: Improve Spatial Relation Understanding in the Operating Room with Multimodal Large Language Model

Peiqi He, Zhenhao Zhang, Yixiang Zhang et al.

Precise spatial modeling in the operating room (OR) is foundational to many clinical tasks, supporting intraoperative awareness, hazard avoidance, and surgical decision-making. While existing approaches leverage large-scale multimodal datasets for latent-space alignment to implicitly learn spatial relationships, they overlook the 3D capabilities of MLLMs. However, this approach raises two issues: (1) Operating rooms typically lack multiple video and audio sensors, making multimodal 3D data difficult to obtain; (2) Training solely on readily available 2D data fails to capture fine-grained details in complex scenes. To address this gap, we introduce Spatial-ORMLLM, the first large vision-language model for 3D spatial reasoning in operating rooms using only RGB modality to infer volumetric and semantic cues, enabling downstream medical tasks with detailed and holistic spatial context. Spatial-ORMLLM incorporates a Spatial-Enhanced Feature Fusion Block, which integrates 2D modality inputs with rich 3D spatial knowledge extracted by the estimation algorithm and then feeds the combined features into the visual tower. By employing a unified end-to-end MLLM framework, it combines powerful spatial features with textual features to deliver robust 3D scene reasoning without any additional expert annotations or sensor inputs. Experiments on multiple benchmark clinical datasets demonstrate that Spatial-ORMLLM achieves state-of-the-art performance and generalizes robustly to previously unseen surgical scenarios and downstream tasks.

CVMar 18, 2025
SMILE: a Scale-aware Multiple Instance Learning Method for Multicenter STAS Lung Cancer Histopathology Diagnosis

Liangrui Pan, Xiaoyu Li, Yutao Dou et al.

Spread through air spaces (STAS) represents a newly identified aggressive pattern in lung cancer, which is known to be associated with adverse prognostic factors and complex pathological features. Pathologists currently rely on time consuming manual assessments, which are highly subjective and prone to variation. This highlights the urgent need for automated and precise diag nostic solutions. 2,970 lung cancer tissue slides are comprised from multiple centers, re-diagnosed them, and constructed and publicly released three lung cancer STAS datasets: STAS CSU (hospital), STAS TCGA, and STAS CPTAC. All STAS datasets provide corresponding pathological feature diagnoses and related clinical data. To address the bias, sparse and heterogeneous nature of STAS, we propose an scale-aware multiple instance learning(SMILE) method for STAS diagnosis of lung cancer. By introducing a scale-adaptive attention mechanism, the SMILE can adaptively adjust high attention instances, reducing over-reliance on local regions and promoting consistent detection of STAS lesions. Extensive experiments show that SMILE achieved competitive diagnostic results on STAS CSU, diagnosing 251 and 319 STAS samples in CPTAC andTCGA,respectively, surpassing clinical average AUC. The 11 open baseline results are the first to be established for STAS research, laying the foundation for the future expansion, interpretability, and clinical integration of computational pathology technologies. The datasets and code are available at https://anonymous.4open.science/r/IJCAI25-1DA1.

CVMay 13, 2024
FORESEE: Multimodal and Multi-view Representation Learning for Robust Prediction of Cancer Survival

Liangrui Pan, Yijun Peng, Yan Li et al.

Integrating the different data modalities of cancer patients can significantly improve the predictive performance of patient survival. However, most existing methods ignore the simultaneous utilization of rich semantic features at different scales in pathology images. When collecting multimodal data and extracting features, there is a likelihood of encountering intra-modality missing data, introducing noise into the multimodal data. To address these challenges, this paper proposes a new end-to-end framework, FORESEE, for robustly predicting patient survival by mining multimodal information. Specifically, the cross-fusion transformer effectively utilizes features at the cellular level, tissue level, and tumor heterogeneity level to correlate prognosis through a cross-scale feature cross-fusion method. This enhances the ability of pathological image feature representation. Secondly, the hybrid attention encoder (HAE) uses the denoising contextual attention module to obtain the contextual relationship features and local detail features of the molecular data. HAE's channel attention module obtains global features of molecular data. Furthermore, to address the issue of missing information within modalities, we propose an asymmetrically masked triplet masked autoencoder to reconstruct lost information within modalities. Extensive experiments demonstrate the superiority of our method over state-of-the-art methods on four benchmark datasets in both complete and missing settings.

CVAug 14, 2025
STAMP: Multi-pattern Attention-aware Multiple Instance Learning for STAS Diagnosis in Multi-center Histopathology Images

Liangrui Pan, xiaoyu Li, Guang Zhu et al.

Spread through air spaces (STAS) constitutes a novel invasive pattern in lung adenocarcinoma (LUAD), associated with tumor recurrence and diminished survival rates. However, large-scale STAS diagnosis in LUAD remains a labor-intensive endeavor, compounded by the propensity for oversight and misdiagnosis due to its distinctive pathological characteristics and morphological features. Consequently, there is a pressing clinical imperative to leverage deep learning models for STAS diagnosis. This study initially assembled histopathological images from STAS patients at the Second Xiangya Hospital and the Third Xiangya Hospital of Central South University, alongside the TCGA-LUAD cohort. Three senior pathologists conducted cross-verification annotations to construct the STAS-SXY, STAS-TXY, and STAS-TCGA datasets. We then propose a multi-pattern attention-aware multiple instance learning framework, named STAMP, to analyze and diagnose the presence of STAS across multi-center histopathology images. Specifically, the dual-branch architecture guides the model to learn STAS-associated pathological features from distinct semantic spaces. Transformer-based instance encoding and a multi-pattern attention aggregation modules dynamically selects regions closely associated with STAS pathology, suppressing irrelevant noise and enhancing the discriminative power of global representations. Moreover, a similarity regularization constraint prevents feature redundancy across branches, thereby improving overall diagnostic accuracy. Extensive experiments demonstrated that STAMP achieved competitive diagnostic results on STAS-SXY, STAS-TXY and STAS-TCGA, with AUCs of 0.8058, 0.8017, and 0.7928, respectively, surpassing the clinical level.

LGJan 12, 2022
Multi-task Joint Strategies of Self-supervised Representation Learning on Biomedical Networks for Drug Discovery

Xiaoqi Wang, Yingjie Cheng, Yaning Yang et al.

Self-supervised representation learning (SSL) on biomedical networks provides new opportunities for drug discovery. However, how to effectively combine multiple SSL models is still challenging and has been rarely explored. Therefore, we propose multi-task joint strategies of self-supervised representation learning on biomedical networks for drug discovery, named MSSL2drug. We design six basic SSL tasks inspired by various modality features including structures, semantics, and attributes in heterogeneous biomedical networks. Importantly, fifteen combinations of multiple tasks are evaluated by a graph attention-based multi-task adversarial learning framework in two drug discovery scenarios. The results suggest two important findings. (1) Combinations of multimodal tasks achieve the best performance compared to other multi-task joint models. (2) The local-global combination models yield higher performance than random two-task combinations when there are the same size of modalities. Therefore, we conjecture that the multimodal and local-global combination strategies can be treated as the guideline of multi-task SSL for drug discovery.