MED-PHJun 10, 2011
Omni-tomography/Multi-tomography -- Integrating Multiple Modalities for Simultaneous ImagingGe Wang, Jie Zhang, Hao Gao et al.
Current tomographic imaging systems need major improvements, especially when multi-dimensional, multi-scale, multi-temporal and multi-parametric phenomena are under investigation. Both preclinical and clinical imaging now depend on in vivo tomography, often requiring separate evaluations by different imaging modalities to define morphologic details, delineate interval changes due to disease or interventions, and study physiological functions that have interconnected aspects. Over the past decade, fusion of multimodality images has emerged with two different approaches: post-hoc image registration and combined acquisition on PET-CT, PET-MRI and other hybrid scanners. There are intrinsic limitations for both the post-hoc image analysis and dual/triple modality approaches defined by registration errors and physical constraints in the acquisition chain. We envision that tomography will evolve beyond current modality fusion and towards grand fusion, a large scale fusion of all or many imaging modalities, which may be referred to as omni-tomography or multi-tomography. Unlike modality fusion, grand fusion is here proposed for truly simultaneous but often localized reconstruction in terms of all or many relevant imaging mechanisms such as CT, MRI, PET, SPECT, US, optical, and possibly more. In this paper, the technical basis for omni-tomography is introduced and illustrated with a top-level design of a next generation scanner, interior tomographic reconstructions of representative modalities, and anticipated applications of omni-tomography.
IVAug 16, 2023
Two-and-a-half Order Score-based Model for Solving 3D Ill-posed Inverse ProblemsZirong Li, Yanyang Wang, Jianjia Zhang et al.
Computed Tomography (CT) and Magnetic Resonance Imaging (MRI) are crucial technologies in the field of medical imaging. Score-based models have proven to be effective in addressing different inverse problems encountered in CT and MRI, such as sparse-view CT and fast MRI reconstruction. However, these models face challenges in achieving accurate three dimensional (3D) volumetric reconstruction. The existing score-based models primarily focus on reconstructing two dimensional (2D) data distribution, leading to inconsistencies between adjacent slices in the reconstructed 3D volumetric images. To overcome this limitation, we propose a novel two-and-a-half order score-based model (TOSM). During the training phase, our TOSM learns data distributions in 2D space, which reduces the complexity of training compared to directly working on 3D volumes. However, in the reconstruction phase, the TOSM updates the data distribution in 3D space, utilizing complementary scores along three directions (sagittal, coronal, and transaxial) to achieve a more precise reconstruction. The development of TOSM is built on robust theoretical principles, ensuring its reliability and efficacy. Through extensive experimentation on large-scale sparse-view CT and fast MRI datasets, our method demonstrates remarkable advancements and attains state-of-the-art results in solving 3D ill-posed inverse problems. Notably, the proposed TOSM effectively addresses the inter-slice inconsistency issue, resulting in high-quality 3D volumetric reconstruction.
CVSep 16, 2022
SQ-Swin: a Pretrained Siamese Quadratic Swin Transformer for Lettuce Browning PredictionDayang Wang, Boce Zhang, Yongshun Xu et al.
Packaged fresh-cut lettuce is widely consumed as a major component of vegetable salad owing to its high nutrition, freshness, and convenience. However, enzymatic browning discoloration on lettuce cut edges significantly reduces product quality and shelf life. While there are many research and breeding efforts underway to minimize browning, the progress is hindered by the lack of a rapid and reliable methodology to evaluate browning. Current methods to identify and quantify browning are either too subjective, labor intensive, or inaccurate. In this paper, we report a deep learning model for lettuce browning prediction. To the best of our knowledge, it is the first-of-its-kind on deep learning for lettuce browning prediction using a pretrained Siamese Quadratic Swin (SQ-Swin) transformer with several highlights. First, our model includes quadratic features in the transformer model which is more powerful to incorporate real-world representations than the linear transformer. Second, a multi-scale training strategy is proposed to augment the data and explore more of the inherent self-similarity of the lettuce images. Third, the proposed model uses a siamese architecture which learns the inter-relations among the limited training samples. Fourth, the model is pretrained on the ImageNet and then trained with the reptile meta-learning algorithm to learn higher-order gradients than a regular one. Experiment results on the fresh-cut lettuce datasets show that the proposed SQ-Swin outperforms the traditional methods and other deep learning-based backbones.
IVOct 3, 2022
Spectral2Spectral: Image-spectral Similarity Assisted Spectral CT Deep Reconstruction without ReferenceXiaodong Guo, Longhui Li, Dingyue Chang et al.
Spectral computed tomography based on a photon-counting detector (PCD) attracts more and more attentions since it has the capability to provide more accurate identification and quantitative analysis for biomedical materials. The limited number of photons within narrow energy bins leads to imaging results of low signal-noise ratio. The existing supervised deep reconstruction networks for CT reconstruction are difficult to address these challenges because it is usually impossible to acquire noise-free clinical images with clear structures as references. In this paper, we propose an iterative deep reconstruction network to synergize unsupervised method and data priors into a unified framework, named as Spectral2Spectral. Our Spectral2Spectral employs an unsupervised deep training strategy to obtain high-quality images from noisy data in an end-to-end fashion. The structural similarity prior within image-spectral domain is refined as a regularization term to further constrain the network training. The weights of neural network are automatically updated to capture image features and structures within the iterative process. Three large-scale preclinical datasets experiments demonstrate that the Spectral2spectral reconstructs better image quality than other the state-of-the-art methods.
IVOct 10, 2022
Masked Autoencoders for Low dose CT denoisingDayang Wang, Yongshun Xu, Shuo Han et al.
Low-dose computed tomography (LDCT) reduces the X-ray radiation but compromises image quality with more noises and artifacts. A plethora of transformer models have been developed recently to improve LDCT image quality. However, the success of a transformer model relies on a large amount of paired noisy and clean data, which is often unavailable in clinical applications. In computer vision and natural language processing fields, masked autoencoders (MAE) have been proposed as an effective label-free self-pretraining method for transformers, due to its excellent feature representation ability. Here, we redesign the classical encoder-decoder learning model to match the denoising task and apply it to LDCT denoising problem. The MAE can leverage the unlabeled data and facilitate structural preservation for the LDCT denoising model when ground truth data are missing. Experiments on the Mayo dataset validate that the MAE can boost the transformer's denoising performance and relieve the dependence on the ground truth data.
IVOct 19, 2023
LoMAE: Low-level Vision Masked Autoencoders for Low-dose CT DenoisingDayang Wang, Yongshun Xu, Shuo Han et al.
Low-dose computed tomography (LDCT) offers reduced X-ray radiation exposure but at the cost of compromised image quality, characterized by increased noise and artifacts. Recently, transformer models emerged as a promising avenue to enhance LDCT image quality. However, the success of such models relies on a large amount of paired noisy and clean images, which are often scarce in clinical settings. In the fields of computer vision and natural language processing, masked autoencoders (MAE) have been recognized as an effective label-free self-pretraining method for transformers, due to their exceptional feature representation ability. However, the original pretraining and fine-tuning design fails to work in low-level vision tasks like denoising. In response to this challenge, we redesign the classical encoder-decoder learning model and facilitate a simple yet effective low-level vision MAE, referred to as LoMAE, tailored to address the LDCT denoising problem. Moreover, we introduce an MAE-GradCAM method to shed light on the latent learning mechanisms of the MAE/LoMAE. Additionally, we explore the LoMAE's robustness and generability across a variety of noise levels. Experiments results show that the proposed LoMAE can enhance the transformer's denoising performance and greatly relieve the dependence on the ground truth clean data. It also demonstrates remarkable robustness and generalizability over a spectrum of noise levels.
LGJan 14, 2022Code
Manifoldron: Direct Space Partition via Manifold DiscoveryDayang Wang, Feng-Lei Fan, Bo-Jian Hou et al.
A neural network with the widely-used ReLU activation has been shown to partition the sample space into many convex polytopes for prediction. However, the parameterized way a neural network and other machine learning models use to partition the space has imperfections, \textit{e}.\textit{g}., the compromised interpretability for complex models, the inflexibility in decision boundary construction due to the generic character of the model, and the risk of being trapped into shortcut solutions. In contrast, although the non-parameterized models can adorably avoid or downplay these issues, they are usually insufficiently powerful either due to over-simplification or the failure to accommodate the manifold structures of data. In this context, we first propose a new type of machine learning models referred to as Manifoldron that directly derives decision boundaries from data and partitions the space via manifold structure discovery. Then, we systematically analyze the key characteristics of the Manifoldron such as manifold characterization capability and its link to neural networks. The experimental results on 4 synthetic examples, 20 public benchmark datasets, and 1 real-world application demonstrate that the proposed Manifoldron performs competitively compared to the mainstream machine learning models. We have shared our code in \url{https://github.com/wdayang/Manifoldron} for free download and evaluation.
IVNov 17, 2025
PoCGM: Poisson-Conditioned Generative Model for Sparse-View CT ReconstructionChangsheng Fang, Yongtong Liu, Bahareh Morovati et al.
In computed tomography (CT), reducing the number of projection views is an effective strategy to lower radiation exposure and/or improve temporal resolution. However, this often results in severe aliasing artifacts and loss of structural details in reconstructed images, posing significant challenges for clinical applications. Inspired by the success of the Poisson Flow Generative Model (PFGM++) in natural image generation, we propose a PoCGM (Poisson-Conditioned Generative Model) to address the challenges of sparse-view CT reconstruction. Since PFGM++ was originally designed for unconditional generation, it lacks direct applicability to medical imaging tasks that require integrating conditional inputs. To overcome this limitation, the PoCGM reformulates PFGM++ into a conditional generative framework by incorporating sparse-view data as guidance during both training and sampling phases. By modeling the posterior distribution of full-view reconstructions conditioned on sparse observations, PoCGM effectively suppresses artifacts while preserving fine structural details. Qualitative and quantitative evaluations demonstrate that PoCGM outperforms the baselines, achieving improved artifact suppression, enhanced detail preservation, and reliable performance in dose-sensitive and time-critical imaging scenarios.
IVSep 16, 2025
Cross-Distribution Diffusion Priors-Driven Iterative Reconstruction for Sparse-View CTHaodong Li, Shuo Han, Haiyang Mao et al.
Sparse-View CT (SVCT) reconstruction enhances temporal resolution and reduces radiation dose, yet its clinical use is hindered by artifacts due to view reduction and domain shifts from scanner, protocol, or anatomical variations, leading to performance degradation in out-of-distribution (OOD) scenarios. In this work, we propose a Cross-Distribution Diffusion Priors-Driven Iterative Reconstruction (CDPIR) framework to tackle the OOD problem in SVCT. CDPIR integrates cross-distribution diffusion priors, derived from a Scalable Interpolant Transformer (SiT), with model-based iterative reconstruction methods. Specifically, we train a SiT backbone, an extension of the Diffusion Transformer (DiT) architecture, to establish a unified stochastic interpolant framework, leveraging Classifier-Free Guidance (CFG) across multiple datasets. By randomly dropping the conditioning with a null embedding during training, the model learns both domain-specific and domain-invariant priors, enhancing generalizability. During sampling, the globally sensitive transformer-based diffusion model exploits the cross-distribution prior within the unified stochastic interpolant framework, enabling flexible and stable control over multi-distribution-to-noise interpolation paths and decoupled sampling strategies, thereby improving adaptation to OOD reconstruction. By alternating between data fidelity and sampling updates, our model achieves state-of-the-art performance with superior detail preservation in SVCT reconstructions. Extensive experiments demonstrate that CDPIR significantly outperforms existing approaches, particularly under OOD conditions, highlighting its robustness and potential clinical value in challenging imaging scenarios.
CVSep 1, 2025
Clinical Metadata Guided Limited-Angle CT Image ReconstructionYu Shi, Shuyi Fan, Changsheng Fang et al.
Limited-angle computed tomography (LACT) offers improved temporal resolution and reduced radiation dose for cardiac imaging, but suffers from severe artifacts due to truncated projections. To address the ill-posedness of LACT reconstruction, we propose a two-stage diffusion framework guided by structured clinical metadata. In the first stage, a transformer-based diffusion model conditioned exclusively on metadata, including acquisition parameters, patient demographics, and diagnostic impressions, generates coarse anatomical priors from noise. The second stage further refines the images by integrating both the coarse prior and metadata to produce high-fidelity results. Physics-based data consistency is enforced at each sampling step in both stages using an Alternating Direction Method of Multipliers module, ensuring alignment with the measured projections. Extensive experiments on both synthetic and real cardiac CT datasets demonstrate that incorporating metadata significantly improves reconstruction fidelity, particularly under severe angular truncation. Compared to existing metadata-free baselines, our method achieves superior performance in SSIM, PSNR, nMI, and PCC. Ablation studies confirm that different types of metadata contribute complementary benefits, particularly diagnostic and demographic priors under limited-angle conditions. These findings highlight the dual role of clinical metadata in improving both reconstruction quality and efficiency, supporting their integration into future metadata-guided medical imaging frameworks.
CVAug 9, 2025
Large Language Model Evaluated Stand-alone Attention-Assisted Graph Neural Network with Spatial and Structural Information Interaction for Precise Endoscopic Image SegmentationJuntong Fan, Shuyi Fan, Debesh Jha et al.
Accurate endoscopic image segmentation on the polyps is critical for early colorectal cancer detection. However, this task remains challenging due to low contrast with surrounding mucosa, specular highlights, and indistinct boundaries. To address these challenges, we propose FOCUS-Med, which stands for Fusion of spatial and structural graph with attentional context-aware polyp segmentation in endoscopic medical imaging. FOCUS-Med integrates a Dual Graph Convolutional Network (Dual-GCN) module to capture contextual spatial and topological structural dependencies. This graph-based representation enables the model to better distinguish polyps from background tissues by leveraging topological cues and spatial connectivity, which are often obscured in raw image intensities. It enhances the model's ability to preserve boundaries and delineate complex shapes typical of polyps. In addition, a location-fused stand-alone self-attention is employed to strengthen global context integration. To bridge the semantic gap between encoder-decoder layers, we incorporate a trainable weighted fast normalized fusion strategy for efficient multi-scale aggregation. Notably, we are the first to introduce the use of a Large Language Model (LLM) to provide detailed qualitative evaluations of segmentation quality. Extensive experiments on public benchmarks demonstrate that FOCUS-Med achieves state-of-the-art performance across five key metrics, underscoring its effectiveness and clinical potential for AI-assisted colonoscopy.
IVDec 3, 2024
$ρ$-NeRF: Leveraging Attenuation Priors in Neural Radiance Field for 3D Computed Tomography ReconstructionLi Zhou, Changsheng Fang, Bahareh Morovati et al.
This paper introduces $ρ$-NeRF, a self-supervised approach that sets a new standard in novel view synthesis (NVS) and computed tomography (CT) reconstruction by modeling a continuous volumetric radiance field enriched with physics-based attenuation priors. The $ρ$-NeRF represents a three-dimensional (3D) volume through a fully-connected neural network that takes a single continuous four-dimensional (4D) coordinate, spatial location $(x, y, z)$ and an initialized attenuation value ($ρ$), and outputs the attenuation coefficient at that position. By querying these 4D coordinates along X-ray paths, the classic forward projection technique is applied to integrate attenuation data across the 3D space. By matching and refining pre-initialized attenuation values derived from traditional reconstruction algorithms like Feldkamp-Davis-Kress algorithm (FDK) or conjugate gradient least squares (CGLS), the enriched schema delivers superior fidelity in both projection synthesis and image recognition.
MED-PHMar 19, 2024
Deep Few-view High-resolution Photon-counting CT at Halved Dose for Extremity ImagingMengzhou Li, Chuang Niu, Ge Wang et al.
X-ray photon-counting computed tomography (PCCT) for extremity allows multi-energy high-resolution (HR) imaging but its radiation dose can be further improved. Despite the great potential of deep learning techniques, their application in HR volumetric PCCT reconstruction has been challenged by the large memory burden, training data scarcity, and domain gap issues. In this paper, we propose a deep learning-based approach for PCCT image reconstruction at halved dose and doubled speed validated in a New Zealand clinical trial. Specifically, we design a patch-based volumetric refinement network to alleviate the GPU memory limitation, train network with synthetic data, and use model-based iterative refinement to bridge the gap between synthetic and clinical data. Our results in a reader study of 8 patients from the clinical trial demonstrate a great potential to cut the radiation dose to half that of the clinical PCCT standard without compromising image quality and diagnostic value.
IVFeb 28, 2022
CTformer: Convolution-free Token2Token Dilated Vision Transformer for Low-dose CT DenoisingDayang Wang, Fenglei Fan, Zhan Wu et al.
Low-dose computed tomography (LDCT) denoising is an important problem in CT research. Compared to the normal dose CT (NDCT), LDCT images are subjected to severe noise and artifacts. Recently in many studies, vision transformers have shown superior feature representation ability over convolutional neural networks (CNNs). However, unlike CNNs, the potential of vision transformers in LDCT denoising was little explored so far. To fill this gap, we propose a Convolution-free Token2Token Dilated Vision Transformer for low-dose CT denoising. The CTformer uses a more powerful token rearrangement to encompass local contextual information and thus avoids convolution. It also dilates and shifts feature maps to capture longer-range interaction. We interpret the CTformer by statically inspecting patterns of its internal attention maps and dynamically tracing the hierarchical attention flow with an explanatory graph. Furthermore, an overlapped inference mechanism is introduced to effectively eliminate the boundary artifacts that are common for encoder-decoder-based denoising models. Experimental results on Mayo LDCT dataset suggest that the CTformer outperforms the state-of-the-art denoising methods with a low computation overhead.
IVJun 17, 2021
AI-Enabled Ultra-Low-Dose CT ReconstructionWeiwen Wu, Chuang Niu, Shadi Ebrahimian et al.
By the ALARA (As Low As Reasonably Achievable) principle, ultra-low-dose CT reconstruction is a holy grail to minimize cancer risks and genetic damages, especially for children. With the development of medical CT technologies, the iterative algorithms are widely used to reconstruct decent CT images from a low-dose scan. Recently, artificial intelligence (AI) techniques have shown a great promise in further reducing CT radiation dose to the next level. In this paper, we demonstrate that AI-powered CT reconstruction offers diagnostic image quality at an ultra-low-dose level comparable to that of radiography. Specifically, here we develop a Split Unrolled Grid-like Alternative Reconstruction (SUGAR) network, in which deep learning, physical modeling and image prior are integrated. The reconstruction results from clinical datasets show that excellent images can be reconstructed using SUGAR from 36 projections. This approach has a potential to change future healthcare.
IVJun 8, 2021
TED-net: Convolution-free T2T Vision Transformer-based Encoder-decoder Dilation network for Low-dose CT DenoisingDayang Wang, Zhan Wu, Hengyong Yu
Low dose computed tomography is a mainstream for clinical applications. How-ever, compared to normal dose CT, in the low dose CT (LDCT) images, there are stronger noise and more artifacts which are obstacles for practical applications. In the last few years, convolution-based end-to-end deep learning methods have been widely used for LDCT image denoising. Recently, transformer has shown superior performance over convolution with more feature interactions. Yet its ap-plications in LDCT denoising have not been fully cultivated. Here, we propose a convolution-free T2T vision transformer-based Encoder-decoder Dilation net-work (TED-net) to enrich the family of LDCT denoising algorithms. The model is free of convolution blocks and consists of a symmetric encoder-decoder block with sole transformer. Our model is evaluated on the AAPM-Mayo clinic LDCT Grand Challenge dataset, and results show outperformance over the state-of-the-art denoising methods.
IVAug 4, 2020
Stabilizing Deep Tomographic ReconstructionWeiwen Wu, Dianlin Hu, Wenxiang Cong et al.
Tomographic image reconstruction with deep learning is an emerging field, but a recent landmark study reveals that several deep reconstruction networks are unstable for computed tomography (CT) and magnetic resonance imaging (MRI). Specifically, three kinds of instabilities were reported: (1) strong image artefacts from tiny perturbations, (2) small features missing in a deeply reconstructed image, and (3) decreased imaging performance with increased input data. On the other hand, compressed sensing (CS) inspired reconstruction methods do not suffer from these instabilities because of their built-in kernel awareness. For deep reconstruction to realize its full potential and become a mainstream approach for tomographic imaging, it is thus critically important to meet this challenge by stabilizing deep reconstruction networks. Here we propose an Analytic Compressed Iterative Deep (ACID) framework to address this challenge. ACID synergizes a deep reconstruction network trained on big data, kernel awareness from CS-inspired processing, and iterative refinement to minimize the data residual relative to real measurement. Our study demonstrates that the deep reconstruction using ACID is accurate and stable, and sheds light on the converging mechanism of the ACID iteration under a Bounded Relative Error Norm (BREN) condition. In particular, the study shows that ACID-based reconstruction is resilient against adversarial attacks, superior to classic sparsity-regularized reconstruction alone, and eliminates the three kinds of instabilities. We anticipate that this integrative data-driven approach will help promote development and translation of deep tomographic image reconstruction networks into clinical applications.
DCJul 17, 2020
EZLDA: Efficient and Scalable LDA on GPUsShilong Wang, Hang Liu, Anil Gaihre et al.
LDA is a statistical approach for topic modeling with a wide range of applications. However, there exist very few attempts to accelerate LDA on GPUs which come with exceptional computing and memory throughput capabilities. To this end, we introduce EZLDA which achieves efficient and scalable LDA training on GPUs with the following three contributions: First, EZLDA introduces three-branch sampling method which takes advantage of the convergence heterogeneity of various tokens to reduce the redundant sampling task. Second, to enable sparsity-aware format for both D and W on GPUs with fast sampling and updating, we introduce hybrid format for W along with corresponding token partition to T and inverted index designs. Third, we design a hierarchical workload balancing solution to address the extremely skewed workload imbalance problem on GPU and scaleEZLDA across multiple GPUs. Taken together, EZLDA achieves superior performance over the state-of-the-art attempts with lower memory consumption.
IVMay 30, 2020
MetaInv-Net: Meta Inversion Network for Sparse View CT Image ReconstructionHaimiao Zhang, Baodong Liu, Hengyong Yu et al.
X-ray Computed Tomography (CT) is widely used in clinical applications such as diagnosis and image-guided interventions. In this paper, we propose a new deep learning based model for CT image reconstruction with the backbone network architecture built by unrolling an iterative algorithm. However, unlike the existing strategy to include as many data-adaptive components in the unrolled dynamics model as possible, we find that it is enough to only learn the parts where traditional designs mostly rely on intuitions and experience. More specifically, we propose to learn an initializer for the conjugate gradient (CG) algorithm that involved in one of the subproblems of the backbone model. Other components, such as image priors and hyperparameters, are kept as the original design. Since a hypernetwork is introduced to inference on the initialization of the CG module, it makes the proposed model a certain meta-learning model. Therefore, we shall call the proposed model the meta-inversion network (MetaInv-Net). The proposed MetaInv-Net can be designed with much less trainable parameters while still preserves its superior image reconstruction performance than some state-of-the-art deep models in CT imaging. In simulated and real data experiments, MetaInv-Net performs very well and can be generalized beyond the training setting, i.e., to other scanning settings, noise levels, and data sets.
IVMay 6, 2019
DLIMD: Dictionary Learning based Image-domain Material Decomposition for spectral CTWeiwen Wu, Haijun Yu, Peijun Chen et al.
The potential huge advantage of spectral computed tomography (CT) is its capability to provide accuracy material identification and quantitative tissue information. This can benefit clinical applications, such as brain angiography, early tumor recognition, etc. To achieve more accurate material components with higher material image quality, we develop a dictionary learning based image-domain material decomposition (DLIMD) for spectral CT in this paper. First, we reconstruct spectral CT image from projections and calculate material coefficients matrix by selecting uniform regions of basis materials from image reconstruction results. Second, we employ the direct inversion (DI) method to obtain initial material decomposition results, and a set of image patches are extracted from the mode-1 unfolding of normalized material image tensor to train a united dictionary by the K-SVD technique. Third, the trained dictionary is employed to explore the similarities from decomposed material images by constructing the DLIMD model. Fourth, more constraints (i.e., volume conservation and the bounds of each pixel within material maps) are further integrated into the model to improve the accuracy of material decomposition. Finally, both physical phantom and preclinical experiments are employed to evaluate the performance of the proposed DLIMD in material decomposition accuracy, material image edge preservation and feature recovery.
LGNov 22, 2018
On a Sparse Shortcut Topology of Artificial Neural NetworksFenglei Fan, Dayang Wang, Hengtao Guo et al.
In established network architectures, shortcut connections are often used to take the outputs of earlier layers as additional inputs to later layers. Despite the extraordinary effectiveness of shortcuts, there remain open questions on the mechanism and characteristics. For example, why are shortcuts powerful? Why do shortcuts generalize well? In this paper, we investigate the expressivity and generalizability of a novel sparse shortcut topology. First, we demonstrate that this topology can empower a one-neuron-wide deep network to approximate any univariate continuous function. Then, we present a novel width-bounded universal approximator in contrast to depth-bounded universal approximators and extend the approximation result to a family of equally competent networks. Furthermore, with generalization bound theory, we show that the proposed shortcut topology enjoys excellent generalizability. Finally, we corroborate our theoretical analyses by comparing the proposed topology with popular architectures, including ResNet and DenseNet, on well-known benchmarks and perform a saliency map analysis to interpret the proposed topology. Our work helps enhance the understanding of the role of shortcuts and suggests further opportunities to innovate neural architectures.
CVOct 22, 2018
Block Matching Frame based Material Reconstruction for Spectral CTWeiwen Wu, Qian Wang, Fenglin Liu et al.
Spectral computed tomography (CT) has a great potential in material identification and decomposition. To achieve high-quality material composition images and further suppress the x-ray beam hardening artifacts, we first propose a one-step material reconstruction model based on Taylor first-order expansion. Then, we develop a basic material reconstruction method named material simultaneous algebraic reconstruction technique (MSART). Considering the local similarity of each material image, we incorporate a powerful block matching frame (BMF) into the material reconstruction (MR) model and generate a BMF based MR (BMFMR) method. Because the BMFMR model contains the L0-norm problem, we adopt a split-Bregman method for optimization. The numerical simulation and physical phantom experiment results validate the correctness of the material reconstruction algorithms and demonstrate that the BMF regularization outperforms the total variation and no-local mean regularizations.
CVJul 24, 2018
Non-local Low-rank Cube-based Tensor Factorization for Spectral CT ReconstructionWeiwen Wu, Fenglin Liu, Yanbo Zhang et al.
Spectral computed tomography (CT) reconstructs material-dependent attenuation images with the projections of multiple narrow energy windows, it is meaningful for material identification and decomposition. Unfortunately, the multi-energy projection dataset always contains strong complicated noise and result in the projections has a lower signal-noise-ratio (SNR). Very recently, the spatial-spectral cube matching frame (SSCMF) was proposed to explore the non-local spatial-spectrum similarities for spectral CT. The method constructs such a group by clustering up a series of non-local spatial-spectrum cubes. The small size of spatial patch for such a group make SSCMF fails to encode the sparsity and low-rank properties. In addition, the hard-thresholding and collaboration filtering operation in the SSCMF are also rough to recover the image features and spatial edges. While for all steps are operated on 4-D group, we may not afford such huge computational and memory load in practical. To avoid the above limitation and further improve image quality, we first formulate a non-local cube-based tensor instead of the group to encode the sparsity and low-rank properties. Then, as a new regularizer, Kronecker-Basis-Representation (KBR) tensor factorization is employed into a basic spectral CT reconstruction model to enhance the ability of extracting image features and protecting spatial edges, generating the non-local low-rank cube-based tensor factorization (NLCTF) method. Finally, the split-Bregman strategy is adopted to solve the NLCTF model. Both numerical simulations and realistic preclinical mouse studies are performed to validate and assess the NLCTF algorithm. The results show that the NLCTF method outperforms the other competitors.
CVDec 13, 2017
Low-dose spectral CT reconstruction using L0 image gradient and tensor dictionaryWeiwen Wu, Yanbo Zhang, Qian Wang et al.
Spectral computed tomography (CT) has a great superiority in lesion detection, tissue characterization and material decomposition. To further extend its potential clinical applications, in this work, we propose an improved tensor dictionary learning method for low-dose spectral CT reconstruction with a constraint of image gradient L0-norm, which is named as L0TDL. The L0TDL method inherits the advantages of tensor dictionary learning (TDL) by employing the similarity of spectral CT images. On the other hand, by introducing the L0-norm constraint in gradient image domain, the proposed method emphasizes the spatial sparsity to overcome the weakness of TDL on preserving edge information. The alternative direction minimization method (ADMM) is employed to solve the proposed method. Both numerical simulations and real mouse studies are perform to evaluate the proposed method. The results show that the proposed L0TDL method outperforms other competing methods, such as total variation (TV) minimization, TV with low rank (TV+LR), and TDL methods.
CVAug 3, 2017
Low Dose CT Image Denoising Using a Generative Adversarial Network with Wasserstein Distance and Perceptual LossQingsong Yang, Pingkun Yan, Yanbo Zhang et al.
In this paper, we introduce a new CT image denoising method based on the generative adversarial network (GAN) with Wasserstein distance and perceptual similarity. The Wasserstein distance is a key concept of the optimal transform theory, and promises to improve the performance of the GAN. The perceptual loss compares the perceptual features of a denoised output against those of the ground truth in an established feature space, while the GAN helps migrate the data noise distribution from strong to weak. Therefore, our proposed method transfers our knowledge of visual perception to the image denoising task, is capable of not only reducing the image noise level but also keeping the critical information at the same time. Promising results have been obtained in our experiments with clinical CT images.
QMNov 21, 2016
Deep Learning for the Classification of Lung NodulesHe Yang, Hengyong Yu, Ge Wang
Deep learning, as a promising new area of machine learning, has attracted a rapidly increasing attention in the field of medical imaging. Compared to the conventional machine learning methods, deep learning requires no hand-tuned feature extractor, and has shown a superior performance in many visual object recognition applications. In this study, we develop a deep convolutional neural network (CNN) and apply it to thoracic CT images for the classification of lung nodules. We present the CNN architecture and classification accuracy for the original images of lung nodules. In order to understand the features of lung nodules, we further construct new datasets, based on the combination of artificial geometric nodules and some transformations of the original images, as well as a stochastic nodule shape model. It is found that simplistic geometric nodules cannot capture the important features of lung nodules.
CVNov 22, 2013
Dictionary-Learning-Based Reconstruction Method for Electron TomographyBaodong Liu, Hengyong Yu, Scott S. Verbridge et al.
Electron tomography usually suffers from so called missing wedge artifacts caused by limited tilt angle range. An equally sloped tomography (EST) acquisition scheme (which should be called the linogram sampling scheme) was recently applied to achieve 2.4-angstrom resolution. On the other hand, a compressive sensing-inspired reconstruction algorithm, known as adaptive dictionary based statistical iterative reconstruction (ADSIR), has been reported for x-ray computed tomography. In this paper, we evaluate the EST, ADSIR and an ordered-subset simultaneous algebraic reconstruction technique (OS-SART), and compare the ES and equally angled (EA) data acquisition modes. Our results show that OS-SART is comparable to EST, and the ADSIR outperforms EST and OS-SART. Furthermore, the equally sloped projection data acquisition mode has no advantage over the conventional equally angled mode in the context.