IVOct 19, 2023Code
DA-TransUNet: Integrating Spatial and Channel Dual Attention with Transformer U-Net for Medical Image SegmentationGuanqun Sun, Yizhi Pan, Weikun Kong et al.
Accurate medical image segmentation is critical for disease quantification and treatment evaluation. While traditional Unet architectures and their transformer-integrated variants excel in automated segmentation tasks. However, they lack the ability to harness the intrinsic position and channel features of image. Existing models also struggle with parameter efficiency and computational complexity, often due to the extensive use of Transformers. To address these issues, this study proposes a novel deep medical image segmentation framework, called DA-TransUNet, aiming to integrate the Transformer and dual attention block(DA-Block) into the traditional U-shaped architecture. Unlike earlier transformer-based U-net models, DA-TransUNet utilizes Transformers and DA-Block to integrate not only global and local features, but also image-specific positional and channel features, improving the performance of medical image segmentation. By incorporating a DA-Block at the embedding layer and within each skip connection layer, we substantially enhance feature extraction capabilities and improve the efficiency of the encoder-decoder structure. DA-TransUNet demonstrates superior performance in medical image segmentation tasks, consistently outperforming state-of-the-art techniques across multiple datasets. In summary, DA-TransUNet offers a significant advancement in medical image segmentation, providing an effective and powerful alternative to existing techniques. Our architecture stands out for its ability to improve segmentation accuracy, thereby advancing the field of automated medical image diagnostics. The codes and parameters of our model will be publicly available at https://github.com/SUN-1024/DA-TransUnet.
IVMay 27, 2022
Lesion classification by model-based feature extraction: A differential affine invariant model of soft tissue elasticityWeiguo Cao, Marc J. Pomeroy, Zhengrong Liang et al.
The elasticity of soft tissues has been widely considered as a characteristic property to differentiate between healthy and vicious tissues and, therefore, motivated several elasticity imaging modalities, such as Ultrasound Elastography, Magnetic Resonance Elastography, and Optical Coherence Elastography. This paper proposes an alternative approach of modeling the elasticity using Computed Tomography (CT) imaging modality for model-based feature extraction machine learning (ML) differentiation of lesions. The model describes a dynamic non-rigid (or elastic) deformation in differential manifold to mimic the soft tissues elasticity under wave fluctuation in vivo. Based on the model, three local deformation invariants are constructed by two tensors defined by the first and second order derivatives from the CT images and used to generate elastic feature maps after normalization via a novel signal suppression method. The model-based elastic image features are extracted from the feature maps and fed to machine learning to perform lesion classifications. Two pathologically proven image datasets of colon polyps (44 malignant and 43 benign) and lung nodules (46 malignant and 20 benign) were used to evaluate the proposed model-based lesion classification. The outcomes of this modeling approach reached the score of area under the curve of the receiver operating characteristics of 94.2 % for the polyps and 87.4 % for the nodules, resulting in an average gain of 5 % to 30 % over ten existing state-of-the-art lesion classification methods. The gains by modeling tissue elasticity for ML differentiation of lesions are striking, indicating the great potential of exploring the modeling strategy to other tissue properties for ML differentiation of lesions.
CVApr 16Code
Precision Synthesis of Multi-Tracer PET via VLM-Modulated Rectified Flow for Stratifying Mild Cognitive ImpairmentTuo Liu, Shuijin Lin, Shaozhen Yan et al.
The biological definition of Alzheimer's disease (AD) relies on multi-modal neuroimaging, yet the clinical utility of positron emission tomography (PET) is limited by cost and radiation exposure, hindering early screening at preclinical or prodromal stages. While generative models offer a promising alternative by synthesizing PET from magnetic resonance imaging (MRI), achieving subject-specific precision remains a primary challenge. Here, we introduce DIReCT$++$, a Domain-Informed ReCTified flow model for synthesizing multi-tracer PET from MRI combined with fundamental clinical information. Our approach integrates a 3D rectified flow architecture to capture complex cross-modal and cross-tracer relationships with a domain-adapted vision-language model (BiomedCLIP) that provides text-guided, personalized generation using clinical scores and imaging knowledge. Extensive evaluations on multi-center datasets demonstrate that DIReCT$++$ not only produces synthetic PET images ($^{18}$F-AV-45 and $^{18}$F-FDG) of superior fidelity and generalizability but also accurately recapitulates disease-specific patterns. Crucially, combining these synthesized PET images with MRI enables precise personalized stratification of mild cognitive impairment (MCI), advancing a scalable, data-efficient tool for the early diagnosis and prognostic prediction of AD. The source code will be released on https://github.com/ladderlab-xjtu/DIReCT-PLUS.
CVFeb 6Code
DuMeta++: Spatiotemporal Dual Meta-Learning for Generalizable Few-Shot Brain Tissue Segmentation Across Diverse AgesYongheng Sun, Jun Shu, Jianhua Ma et al.
Accurate segmentation of brain tissues from MRI scans is critical for neuroscience and clinical applications, but achieving consistent performance across the human lifespan remains challenging due to dynamic, age-related changes in brain appearance and morphology. While prior work has sought to mitigate these shifts by using self-supervised regularization with paired longitudinal data, such data are often unavailable in practice. To address this, we propose \emph{DuMeta++}, a dual meta-learning framework that operates without paired longitudinal data. Our approach integrates: (1) meta-feature learning to extract age-agnostic semantic representations of spatiotemporally evolving brain structures, and (2) meta-initialization learning to enable data-efficient adaptation of the segmentation model. Furthermore, we propose a memory-bank-based class-aware regularization strategy to enforce longitudinal consistency without explicit longitudinal supervision. We theoretically prove the convergence of our DuMeta++, ensuring stability. Experiments on diverse datasets (iSeg-2019, IBIS, OASIS, ADNI) under few-shot settings demonstrate that DuMeta++ outperforms existing methods in cross-age generalization. Code will be available at https://github.com/ladderlab-xjtu/DuMeta++.
IVJul 20, 2024
Large-vocabulary forensic pathological analyses via prototypical cross-modal contrastive learningChen Shen, Chunfeng Lian, Wanqing Zhang et al.
Forensic pathology is critical in determining the cause and manner of death through post-mortem examinations, both macroscopic and microscopic. The field, however, grapples with issues such as outcome variability, laborious processes, and a scarcity of trained professionals. This paper presents SongCi, an innovative visual-language model (VLM) designed specifically for forensic pathology. SongCi utilizes advanced prototypical cross-modal self-supervised contrastive learning to enhance the accuracy, efficiency, and generalizability of forensic analyses. It was pre-trained and evaluated on a comprehensive multi-center dataset, which includes over 16 million high-resolution image patches, 2,228 vision-language pairs of post-mortem whole slide images (WSIs), and corresponding gross key findings, along with 471 distinct diagnostic outcomes. Our findings indicate that SongCi surpasses existing multi-modal AI models in many forensic pathology tasks, performs comparably to experienced forensic pathologists and significantly better than less experienced ones, and provides detailed multi-modal explainability, offering critical assistance in forensic investigations. To the best of our knowledge, SongCi is the first VLM specifically developed for forensic pathological analysis and the first large-vocabulary computational pathology (CPath) model that directly processes gigapixel WSIs in forensic science.
IVJun 27, 2025Code
Noise-Inspired Diffusion Model for Generalizable Low-Dose CT ReconstructionQi Gao, Zhihao Chen, Dong Zeng et al.
The generalization of deep learning-based low-dose computed tomography (CT) reconstruction models to doses unseen in the training data is important and remains challenging. Previous efforts heavily rely on paired data to improve the generalization performance and robustness through collecting either diverse CT data for re-training or a few test data for fine-tuning. Recently, diffusion models have shown promising and generalizable performance in low-dose CT (LDCT) reconstruction, however, they may produce unrealistic structures due to the CT image noise deviating from Gaussian distribution and imprecise prior information from the guidance of noisy LDCT images. In this paper, we propose a noise-inspired diffusion model for generalizable LDCT reconstruction, termed NEED, which tailors diffusion models for noise characteristics of each domain. First, we propose a novel shifted Poisson diffusion model to denoise projection data, which aligns the diffusion process with the noise model in pre-log LDCT projections. Second, we devise a doubly guided diffusion model to refine reconstructed images, which leverages LDCT images and initial reconstructions to more accurately locate prior information and enhance reconstruction fidelity. By cascading these two diffusion models for dual-domain reconstruction, our NEED requires only normal-dose data for training and can be effectively extended to various unseen dose levels during testing via a time step matching strategy. Extensive qualitative, quantitative, and segmentation-based evaluations on two datasets demonstrate that our NEED consistently outperforms state-of-the-art methods in reconstruction and generalization performance. Source code is made available at https://github.com/qgao21/NEED.
CVApr 8
Vision-Language Model-Guided Deep Unrolling Enables Personalized, Fast MRIFangmao Ju, Yuzhu He, Zhiwen Xue et al.
Magnetic Resonance Imaging (MRI) is a cornerstone in medicine and healthcare but suffers from long acquisition times. Traditional accelerated MRI methods optimize for generic image quality, lacking adaptability for specific clinical tasks. To address this, we introduce PASS (Personalized, Anomaly-aware Sampling and reconStruction), an intelligent MRI framework that leverages a Vision-Language Model (VLM) to guide a deep unrolling network for task-oriented, fast imaging. PASS dynamically personalizes the imaging pipeline through three core contributions: (1) a deep unrolled reconstruction network derived from a physics-based MRI model; (2) a sampling module that generates patient-specific $k$-space trajectories; and (3) an anomaly-aware prior, extracted from a pretrained VLM, which steers both sampling and reconstruction toward clinically relevant regions. By integrating the high-level clinical reasoning of a VLM with an interpretable, physics-aware network, PASS achieves superior image quality across diverse anatomies, contrasts, anomalies, and acceleration factors. This enhancement directly translates to improvements in downstream diagnostic tasks, including fine-grained anomaly detection, localization, and diagnosis.
CVAug 25, 2025Code
UniSino: Physics-Driven Foundational Model for Universal CT Sinogram StandardizationXingyu Ai, Shaoyu Wang, Zhiyuan Jia et al.
During raw-data acquisition in CT imaging, diverse factors can degrade the collected sinograms, with undersampling and noise leading to severe artifacts and noise in reconstructed images and compromising diagnostic accuracy. Conventional correction methods rely on manually designed algorithms or fixed empirical parameters, but these approaches often lack generalizability across heterogeneous artifact types. To address these limitations, we propose UniSino, a foundation model for universal CT sinogram standardization. Unlike existing foundational models that operate in image domain, UniSino directly standardizes data in the projection domain, which enables stronger generalization across diverse undersampling scenarios. Its training framework incorporates the physical characteristics of sinograms, enhancing generalization and enabling robust performance across multiple subtasks spanning four benchmark datasets. Experimental results demonstrate thatUniSino achieves superior reconstruction quality both single and mixed undersampling case, demonstrating exceptional robustness and generalization in sinogram enhancement for CT imaging. The code is available at: https://github.com/yqx7150/UniSino.
AIAug 11, 2025
FEAT: A Multi-Agent Forensic AI System with Domain-Adapted Large Language Model for Automated Cause-of-Death AnalysisChen Shen, Wanqing Zhang, Kehan Li et al.
Forensic cause-of-death determination faces systemic challenges, including workforce shortages and diagnostic variability, particularly in high-volume systems like China's medicolegal infrastructure. We introduce FEAT (ForEnsic AgenT), a multi-agent AI framework that automates and standardizes death investigations through a domain-adapted large language model. FEAT's application-oriented architecture integrates: (i) a central Planner for task decomposition, (ii) specialized Local Solvers for evidence analysis, (iii) a Memory & Reflection module for iterative refinement, and (iv) a Global Solver for conclusion synthesis. The system employs tool-augmented reasoning, hierarchical retrieval-augmented generation, forensic-tuned LLMs, and human-in-the-loop feedback to ensure legal and medical validity. In evaluations across diverse Chinese case cohorts, FEAT outperformed state-of-the-art AI systems in both long-form autopsy analyses and concise cause-of-death conclusions. It demonstrated robust generalization across six geographic regions and achieved high expert concordance in blinded validations. Senior pathologists validated FEAT's outputs as comparable to those of human experts, with improved detection of subtle evidentiary nuances. To our knowledge, FEAT is the first LLM-based AI agent system dedicated to forensic medicine, offering scalable, consistent death certification while maintaining expert-level rigor. By integrating AI efficiency with human oversight, this work could advance equitable access to reliable medicolegal services while addressing critical capacity constraints in forensic systems.
IVMay 3, 2025
Continuous Filtered Backprojection by Learnable Interpolation NetworkHui Lin, Dong Zeng, Qi Xie et al.
Accurate reconstruction of computed tomography (CT) images is crucial in medical imaging field. However, there are unavoidable interpolation errors in the backprojection step of the conventional reconstruction methods, i.e., filtered-back-projection based methods, which are detrimental to the accurate reconstruction. In this study, to address this issue, we propose a novel deep learning model, named Leanable-Interpolation-based FBP or LInFBP shortly, to enhance the reconstructed CT image quality, which achieves learnable interpolation in the backprojection step of filtered backprojection (FBP) and alleviates the interpolation errors. Specifically, in the proposed LInFBP, we formulate every local piece of the latent continuous function of discrete sinogram data as a linear combination of selected basis functions, and learn this continuous function by exploiting a deep network to predict the linear combination coefficients. Then, the learned latent continuous function is exploited for interpolation in backprojection step, which first time takes the advantage of deep learning for the interpolation in FBP. Extensive experiments, which encompass diverse CT scenarios, demonstrate the effectiveness of the proposed LInFBP in terms of enhanced reconstructed image quality, plug-and-play ability and generalization capability.
IVDec 31, 2024
SS-CTML: Self-Supervised Cross-Task Mutual Learning for CT Image ReconstructionGaofeng Chen, Yaoduo Zhang, Li Huang et al.
Supervised deep-learning (SDL) techniques with paired training datasets have been widely studied for X-ray computed tomography (CT) image reconstruction. However, due to the difficulties of obtaining paired training datasets in clinical routine, the SDL methods are still away from common uses in clinical practices. In recent years, self-supervised deep-learning (SSDL) techniques have shown great potential for the studies of CT image reconstruction. In this work, we propose a self-supervised cross-task mutual learning (SS-CTML) framework for CT image reconstruction. Specifically, a sparse-view scanned and a limited-view scanned sinogram data are first extracted from a full-view scanned sinogram data, which results in three individual reconstruction tasks, i.e., the full-view CT (FVCT) reconstruction, the sparse-view CT (SVCT) reconstruction, and limited-view CT (LVCT) reconstruction. Then, three neural networks are constructed for the three reconstruction tasks. Considering that the ultimate goals of the three tasks are all to reconstruct high-quality CT images, we therefore construct a set of cross-task mutual learning objectives for the three tasks, in which way, the three neural networks can be self-supervised optimized by learning from each other. Clinical datasets are adopted to evaluate the effectiveness of the proposed framework. Experimental results demonstrate that the SS-CTML framework can obtain promising CT image reconstruction performance in terms of both quantitative and qualitative measurements.
SIAug 9, 2020
Improving Smart Conference Participation through Socially-Aware RecommendationNana Yaw Asabere, Feng Xia, Wei Wang et al.
This research addresses recommending presentation sessions at smart conferences to participants. We propose a venue recommendation algorithm, Socially-Aware Recommendation of Venues and Environments (SARVE). SARVE computes correlation and social characteristic information of conference participants. In order to model a recommendation process using distributed community detection, SARVE further integrates the current context of both the smart conference community and participants. SARVE recommends presentation sessions that may be of high interest to each participant. We evaluate SARVE using a real world dataset. In our experiments, we compare SARVE to two related state-of-the-art methods, namely: Context-Aware Mobile Recommendation Services (CAMRS) and Conference Navigator (Recommender) Model. Our experimental results show that in terms of the utilized evaluation metrics: precision, recall, and f-measure, SARVE achieves more reliable and favorable social (relations and context) recommendation results.
IVOct 14, 2019
Direct Energy-resolving CT Imaging via Energy-integrating CT images using a Unified Generative Adversarial NetworkLisha Yao, Sui Li, Manman Zhu et al.
Energy-resolving computed tomography (ErCT) has the ability to acquire energy-dependent measurements simultaneously and quantitative material information with improved contrast-to-noise ratio. Meanwhile, ErCT imaging system is usually equipped with an advanced photon counting detector, which is expensive and technically complex. Therefore, clinical ErCT scanners are not yet commercially available, and they are in various stage of completion. This makes the researchers less accessible to the ErCT images. In this work, we investigate to produce ErCT images directly from existing energy-integrating CT (EiCT) images via deep neural network. Specifically, different from other networks that produce ErCT images at one specific energy, this model employs a unified generative adversarial network (uGAN) to concurrently train EiCT datasets and ErCT datasets with different energies and then performs image-to-image translation from existing EiCT images to multiple ErCT image outputs at various energy bins. In this study, the present uGAN generates ErCT images at 70keV, 90keV, 110keV, and 130keV simultaneously from EiCT images at140kVp. We evaluate the present uGAN model on a set of over 1380 CT image slices and show that the present uGAN model can produce promising ErCT estimation results compared with the ground truth qualitatively and quantitatively.
STR-ELOct 4, 2018
Machine learning electron correlation in a disordered mediumJianhua Ma, Puhan Zhang, Yaohua Tan et al.
Learning from data has led to a paradigm shift in computational materials science. In particular, it has been shown that neural networks can learn the potential energy surface and interatomic forces through examples, thus bypassing the computationally expensive density functional theory calculations. Combining many-body techniques with a deep learning approach, we demonstrate that a fully-connected neural network is able to learn the complex collective behavior of electrons in strongly correlated systems. Specifically, we consider the Anderson-Hubbard (AH) model, which is a canonical system for studying the interplay between electron correlation and strong localization. The ground states of the AH model on a square lattice are obtained using the real-space Gutzwiller method. The obtained solutions are used to train a multi-task multi-layer neural network, which subsequently can accurately predict quantities such as the local probability of double occupation and the quasiparticle weight, given the disorder potential in the neighborhood as the input.
CVSep 7, 2018
Predicting Lung Nodule Malignancies by Combining Deep Convolutional Neural Network and Handcrafted FeaturesShulong Li, Panpan Xu, Bin Li et al.
To predict lung nodule malignancy with a high sensitivity and specificity, we propose a fusion algorithm that combines handcrafted features (HF) into the features learned at the output layer of a 3D deep convolutional neural network (CNN). First, we extracted twenty-nine handcrafted features, including nine intensity features, eight geometric features, and twelve texture features based on grey-level co-occurrence matrix (GLCM) averaged from thirteen directions. We then trained 3D CNNs modified from three state-of-the-art 2D CNN architectures (AlexNet, VGG-16 Net and Multi-crop Net) to extract the CNN features learned at the output layer. For each 3D CNN, the CNN features combined with the 29 handcrafted features were used as the input for the support vector machine (SVM) coupled with the sequential forward feature selection (SFS) method to select the optimal feature subset and construct the classifiers. The fusion algorithm takes full advantage of the handcrafted features and the highest level CNN features learned at the output layer. It can overcome the disadvantage of the handcrafted features that may not fully reflect the unique characteristics of a particular lesion by combining the intrinsic CNN features. Meanwhile, it also alleviates the requirement of a large scale annotated dataset for the CNNs based on the complementary of handcrafted features. The patient cohort includes 431 malignant nodules and 795 benign nodules extracted from the LIDC/IDRI database. For each investigated CNN architecture, the proposed fusion algorithm achieved the highest AUC, accuracy, sensitivity, and specificity scores among all competitive classification models.
CVAug 9, 2018
Radon Inversion via Deep LearningJi He, Jianhua Ma
Radon transform is widely used in physical and life sciences and one of its major applications is the X-ray computed tomography (X-ray CT), which is significant in modern health examination. The Radon inversion or image reconstruction is challenging due to the potentially defective radon projections. Conventionally, the reconstruction process contains several ad hoc stages to approximate the corresponding Radon inversion. Each of the stages is highly dependent on the results of the previous stage. In this paper, we propose a novel unified framework for Radon inversion via deep learning (DL). The Radon inversion can be approximated by the proposed framework with an end-to-end fashion instead of processing step-by-step with multiple stages. For simplicity, the proposed framework is short as iRadonMap (inverse Radon transform approximation). Specifically, we implement the iRadonMap as an appropriative neural network, of which the architecture can be divided into two segments. In the first segment, a learnable fully-connected filtering layer is used to filter the radon projections along the view-angle direction, which is followed by a learnable sinusoidal back-projection layer to transfer the filtered radon projections into an image. The second segment is a common neural network architecture to further improve the reconstruction performance in the image domain. The iRadonMap is overall optimized by training a large number of generic images from ImageNet database. To evaluate the performance of the iRadonMap, clinical patient data is used. Qualitative results show promising reconstruction performance of the iRadonMap.
MED-PHDec 4, 2014
Statistical models and regularization strategies in statistical image reconstruction of low-dose X-ray CT: a surveyHao Zhang, Jing Wang, Jianhua Ma et al.
Statistical image reconstruction (SIR) methods have shown potential to substantially improve the image quality of low-dose X-ray computed tomography (CT) as compared to the conventional filtered back-projection (FBP) method for various clinical tasks. According to the maximum a posterior (MAP) estimation, the SIR methods can be typically formulated by an objective function consisting of two terms: (1) data-fidelity (or equivalently, data-fitting or data-mismatch) term modeling the statistics of projection measurements, and (2) regularization (or equivalently, prior or penalty) term reflecting prior knowledge or expectation on the characteristics of the image to be reconstructed. Existing SIR methods for low-dose CT can be divided into two groups: (1) those that use calibrated transmitted photon counts (before log-transform) with penalized maximum likelihood (pML) criterion, and (2) those that use calibrated line-integrals (after log-transform) with penalized weighted least-squares (PWLS) criterion. Accurate statistical modeling of the projection measurements is a prerequisite for SIR, while the regularization term in the objective function also plays a critical role for successful image reconstruction. This paper reviews several statistical models on CT projection measurements and various regularization strategies incorporating prior knowledge or expected properties of the image to be reconstructed, which together formulate the objective function of the SIR methods for low-dose X-ray CT.