Guang Yang

IV
h-index56
282papers
9,267citations
Novelty47%
AI Score60

282 Papers

IVMar 18, 2023Code
Diff-UNet: A Diffusion Embedded Network for Volumetric Segmentation

Zhaohu Xing, Liang Wan, Huazhu Fu et al.

In recent years, Denoising Diffusion Models have demonstrated remarkable success in generating semantically valuable pixel-wise representations for image generative modeling. In this study, we propose a novel end-to-end framework, called Diff-UNet, for medical volumetric segmentation. Our approach integrates the diffusion model into a standard U-shaped architecture to extract semantic information from the input volume effectively, resulting in excellent pixel-level representations for medical volumetric segmentation. To enhance the robustness of the diffusion model's prediction results, we also introduce a Step-Uncertainty based Fusion (SUF) module during inference to combine the outputs of the diffusion models at each step. We evaluate our method on three datasets, including multimodal brain tumors in MRI, liver tumors, and multi-organ CT volumes, and demonstrate that Diff-UNet outperforms other state-of-the-art methods significantly. Our experimental results also indicate the universality and effectiveness of the proposed model. The proposed framework has the potential to facilitate the accurate diagnosis and treatment of medical conditions by enabling more precise segmentation of anatomical structures. The codes of Diff-UNet are available at https://github.com/ge-xing/Diff-UNet

CLMar 14, 2022
Delta Tuning: A Comprehensive Study of Parameter Efficient Methods for Pre-trained Language Models

Ning Ding, Yujia Qin, Guang Yang et al. · tsinghua

Despite the success, the process of fine-tuning large-scale PLMs brings prohibitive adaptation costs. In fact, fine-tuning all the parameters of a colossal model and retaining separate instances for different tasks are practically infeasible. This necessitates a new branch of research focusing on the parameter-efficient adaptation of PLMs, dubbed as delta tuning in this paper. In contrast with the standard fine-tuning, delta tuning only fine-tunes a small portion of the model parameters while keeping the rest untouched, largely reducing both the computation and storage costs. Recent studies have demonstrated that a series of delta tuning methods with distinct tuned parameter selection could achieve performance on a par with full-parameter fine-tuning, suggesting a new promising way of stimulating large-scale PLMs. In this paper, we first formally describe the problem of delta tuning and then comprehensively review recent delta tuning approaches. We also propose a unified categorization criterion that divide existing delta tuning methods into three groups: addition-based, specification-based, and reparameterization-based methods. Though initially proposed as an efficient method to steer large models, we believe that some of the fascinating evidence discovered along with delta tuning could help further reveal the mechanisms of PLMs and even deep neural networks. To this end, we discuss the theoretical principles underlying the effectiveness of delta tuning and propose frameworks to interpret delta tuning from the perspective of optimization and optimal control, respectively. Furthermore, we provide a holistic empirical study of representative methods, where results on over 100 NLP tasks demonstrate a comprehensive performance comparison of different approaches. The experimental results also cover the analysis of combinatorial, scaling and transferable properties of delta tuning.

IVSep 15, 2022Code
MIPI 2022 Challenge on Quad-Bayer Re-mosaic: Dataset and Report

Qingyu Yang, Guang Yang, Jun Jiang et al.

Developing and integrating advanced image sensors with novel algorithms in camera systems are prevalent with the increasing demand for computational photography and imaging on mobile platforms. However, the lack of high-quality data for research and the rare opportunity for in-depth exchange of views from industry and academia constrain the development of mobile intelligent photography and imaging (MIPI). To bridge the gap, we introduce the first MIPI challenge, including five tracks focusing on novel image sensors and imaging algorithms. In this paper, Quad Joint Remosaic and Denoise, one of the five tracks, working on the interpolation of Quad CFA to Bayer at full resolution, is introduced. The participants were provided a new dataset, including 70 (training) and 15 (validation) scenes of high-quality Quad and Bayer pairs. In addition, for each scene, Quad of different noise levels was provided at 0dB, 24dB, and 42dB. All the data were captured using a Quad sensor in both outdoor and indoor conditions. The final results are evaluated using objective metrics, including PSNR, SSIM, LPIPS, and KLD. A detailed description of all models developed in this challenge is provided in this paper. More details of this challenge and the link to the dataset can be found at https://github.com/mipi-challenge/MIPI2022.

CLApr 10, 2025
Seed1.5-Thinking: Advancing Superb Reasoning Models with Reinforcement Learning

ByteDance Seed, Jiaze Chen, Tiantian Fan et al. · bytedance

We introduce Seed1.5-Thinking, capable of reasoning through thinking before responding, resulting in improved performance on a wide range of benchmarks. Seed1.5-Thinking achieves 86.7 on AIME 2024, 55.0 on Codeforces and 77.3 on GPQA, demonstrating excellent reasoning abilities in STEM and coding. Beyond reasoning tasks, the method demonstrates notable generalization across diverse domains. For instance, it surpasses DeepSeek R1 by 8% in win rate on non-reasoning tasks, indicating its broader applicability. Compared to other state-of-the-art reasoning models, Seed1.5-Thinking is a Mixture-of-Experts (MoE) model with a relatively small size, featuring 20B activated and 200B total parameters. As part of our effort to assess generalized reasoning, we develop two internal benchmarks, BeyondAIME and Codeforces, both of which will be publicly released to support future research. Model trial link: https://www.volcengine.com/experience/ark.

CVSep 15, 2022Code
MIPI 2022 Challenge on RGBW Sensor Re-mosaic: Dataset and Report

Qingyu Yang, Guang Yang, Jun Jiang et al.

Developing and integrating advanced image sensors with novel algorithms in camera systems are prevalent with the increasing demand for computational photography and imaging on mobile platforms. However, the lack of high-quality data for research and the rare opportunity for in-depth exchange of views from industry and academia constrain the development of mobile intelligent photography and imaging (MIPI). To bridge the gap, we introduce the first MIPI challenge including five tracks focusing on novel image sensors and imaging algorithms. In this paper, RGBW Joint Remosaic and Denoise, one of the five tracks, working on the interpolation of RGBW CFA to Bayer at full resolution, is introduced. The participants were provided with a new dataset including 70 (training) and 15 (validation) scenes of high-quality RGBW and Bayer pairs. In addition, for each scene, RGBW of different noise levels was provided at 0dB, 24dB, and 42dB. All the data were captured using an RGBW sensor in both outdoor and indoor conditions. The final results are evaluated using objective metrics including PSNR, SSIM, LPIPS, and KLD. A detailed description of all models developed in this challenge is provided in this paper. More details of this challenge and the link to the dataset can be found at https://github.com/mipi-challenge/MIPI2022.

IVSep 15, 2022Code
MIPI 2022 Challenge on RGBW Sensor Fusion: Dataset and Report

Qingyu Yang, Guang Yang, Jun Jiang et al.

Developing and integrating advanced image sensors with novel algorithms in camera systems are prevalent with the increasing demand for computational photography and imaging on mobile platforms. However, the lack of high-quality data for research and the rare opportunity for in-depth exchange of views from industry and academia constrain the development of mobile intelligent photography and imaging (MIPI). To bridge the gap, we introduce the first MIPI challenge, including five tracks focusing on novel image sensors and imaging algorithms. In this paper, RGBW Joint Fusion and Denoise, one of the five tracks, working on the fusion of binning-mode RGBW to Bayer, is introduced. The participants were provided with a new dataset including 70 (training) and 15 (validation) scenes of high-quality RGBW and Bayer pairs. In addition, for each scene, RGBW of different noise levels was provided at 24dB and 42dB. All the data were captured using an RGBW sensor in both outdoor and indoor conditions. The final results are evaluated using objective metrics, including PSNR, SSIM}, LPIPS, and KLD. A detailed description of all models developed in this challenge is provided in this paper. More details of this challenge and the link to the dataset can be found at https://github.com/mipi-challenge/MIPI2022.

IVJun 25, 2023Code
CDiffMR: Can We Replace the Gaussian Noise with K-Space Undersampling for Fast MRI?

Jiahao Huang, Angelica Aviles-Rivero, Carola-Bibiane Schönlieb et al.

Deep learning has shown the capability to substantially accelerate MRI reconstruction while acquiring fewer measurements. Recently, diffusion models have gained burgeoning interests as a novel group of deep learning-based generative methods. These methods seek to sample data points that belong to a target distribution from a Gaussian distribution, which has been successfully extended to MRI reconstruction. In this work, we proposed a Cold Diffusion-based MRI reconstruction method called CDiffMR. Different from conventional diffusion models, the degradation operation of our CDiffMR is based on \textit{k}-space undersampling instead of adding Gaussian noise, and the restoration network is trained to harness a de-aliaseing function. We also design starting point and data consistency conditioning strategies to guide and accelerate the reverse process. More intriguingly, the pre-trained CDiffMR model can be reused for reconstruction tasks with different undersampling rates. We demonstrated, through extensive numerical and visual experiments, that the proposed CDiffMR can achieve comparable or even superior reconstruction results than state-of-the-art models. Compared to the diffusion model-based counterpart, CDiffMR reaches readily competing results using only $1.6 \sim 3.4\%$ for inference time. The code is publicly available at https://github.com/ayanglab/CDiffMR.

AISep 2, 2022Code
Multi-Modal Experience Inspired AI Creation

Qian Cao, Xu Chen, Ruihua Song et al.

AI creation, such as poem or lyrics generation, has attracted increasing attention from both industry and academic communities, with many promising models proposed in the past few years. Existing methods usually estimate the outputs based on single and independent visual or textual information. However, in reality, humans usually make creations according to their experiences, which may involve different modalities and be sequentially correlated. To model such human capabilities, in this paper, we define and solve a novel AI creation problem based on human experiences. More specifically, we study how to generate texts based on sequential multi-modal information. Compared with the previous works, this task is much more difficult because the designed model has to well understand and adapt the semantics among different modalities and effectively convert them into the output in a sequential manner. To alleviate these difficulties, we firstly design a multi-channel sequence-to-sequence architecture equipped with a multi-modal attention network. For more effective optimization, we then propose a curriculum negative sampling strategy tailored for the sequential inputs. To benchmark this problem and demonstrate the effectiveness of our model, we manually labeled a new multi-modal experience dataset. With this dataset, we conduct extensive experiments by comparing our model with a series of representative baselines, where we can demonstrate significant improvements in our model based on both automatic and human-centered metrics. The code and data are available at: \url{https://github.com/Aman-4-Real/MMTG}.

IVSep 5, 2022
Fuzzy Attention Neural Network to Tackle Discontinuity in Airway Segmentation

Yang Nan, Javier Del Ser, Zeyu Tang et al.

Airway segmentation is crucial for the examination, diagnosis, and prognosis of lung diseases, while its manual delineation is unduly burdensome. To alleviate this time-consuming and potentially subjective manual procedure, researchers have proposed methods to automatically segment airways from computerized tomography (CT) images. However, some small-sized airway branches (e.g., bronchus and terminal bronchioles) significantly aggravate the difficulty of automatic segmentation by machine learning models. In particular, the variance of voxel values and the severe data imbalance in airway branches make the computational module prone to discontinuous and false-negative predictions. especially for cohorts with different lung diseases. Attention mechanism has shown the capacity to segment complex structures, while fuzzy logic can reduce the uncertainty in feature representations. Therefore, the integration of deep attention networks and fuzzy theory, given by the fuzzy attention layer, should be an escalated solution for better generalization and robustness. This paper presents an efficient method for airway segmentation, comprising a novel fuzzy attention neural network and a comprehensive loss function to enhance the spatial continuity of airway segmentation. The deep fuzzy set is formulated by a set of voxels in the feature map and a learnable Gaussian membership function. Different from the existing attention mechanism, the proposed channel-specific fuzzy attention addresses the issue of heterogeneous features in different channels. Furthermore, a novel evaluation metric is proposed to assess both the continuity and completeness of airway structures. The efficiency, generalization and robustness of the proposed method have been proved by training on normal lung disease while testing on datasets of lung cancer, COVID-19 and pulmonary fibrosis.

CVJul 5, 2022Code
Swin Deformable Attention U-Net Transformer (SDAUT) for Explainable Fast MRI

Jiahao Huang, Xiaodan Xing, Zhifan Gao et al.

Fast MRI aims to reconstruct a high fidelity image from partially observed measurements. Exuberant development in fast MRI using deep learning has been witnessed recently. Meanwhile, novel deep learning paradigms, e.g., Transformer based models, are fast-growing in natural language processing and promptly developed for computer vision and medical image analysis due to their prominent performance. Nevertheless, due to the complexity of the Transformer, the application of fast MRI may not be straightforward. The main obstacle is the computational cost of the self-attention layer, which is the core part of the Transformer, can be expensive for high resolution MRI inputs. In this study, we propose a new Transformer architecture for solving fast MRI that coupled Shifted Windows Transformer with U-Net to reduce the network complexity. We incorporate deformable attention to construe the explainability of our reconstruction model. We empirically demonstrate that our method achieves consistently superior performance on the fast MRI task. Besides, compared to state-of-the-art Transformer models, our method has fewer network parameters while revealing explainability. The code is publicly available at https://github.com/ayanglab/SDAUT.

IVMar 10, 2023
Multi-site, Multi-domain Airway Tree Modeling (ATM'22): A Public Benchmark for Pulmonary Airway Segmentation

Minghui Zhang, Yangqian Wu, Hanxiao Zhang et al. · harvard

Open international challenges are becoming the de facto standard for assessing computer vision and image analysis algorithms. In recent years, new methods have extended the reach of pulmonary airway segmentation that is closer to the limit of image resolution. Since EXACT'09 pulmonary airway segmentation, limited effort has been directed to quantitative comparison of newly emerged algorithms driven by the maturity of deep learning based approaches and clinical drive for resolving finer details of distal airways for early intervention of pulmonary diseases. Thus far, public annotated datasets are extremely limited, hindering the development of data-driven methods and detailed performance evaluation of new algorithms. To provide a benchmark for the medical imaging community, we organized the Multi-site, Multi-domain Airway Tree Modeling (ATM'22), which was held as an official challenge event during the MICCAI 2022 conference. ATM'22 provides large-scale CT scans with detailed pulmonary airway annotation, including 500 CT scans (300 for training, 50 for validation, and 150 for testing). The dataset was collected from different sites and it further included a portion of noisy COVID-19 CTs with ground-glass opacity and consolidation. Twenty-three teams participated in the entire phase of the challenge and the algorithms for the top ten teams are reviewed in this paper. Quantitative and qualitative results revealed that deep learning models embedded with the topological continuity enhancement achieved superior performance in general. ATM'22 challenge holds as an open-call design, the training data and the gold standard evaluation are available upon successful registration via its homepage.

IVJan 23, 2023Code
Is Autoencoder Truly Applicable for 3D CT Super-Resolution?

Weixun Luo, Xiaodan Xing, Guang Yang

Featured by a bottleneck structure, autoencoder (AE) and its variants have been largely applied in various medical image analysis tasks, such as segmentation, reconstruction and de-noising. Despite of their promising performances in aforementioned tasks, in this paper, we claim that AE models are not applicable to single image super-resolution (SISR) for 3D CT data. Our hypothesis is that the bottleneck architecture that resizes feature maps in AE models degrades the details of input images, thus can sabotage the performance of super-resolution. Although U-Net proposed skip connections that merge information from different levels, we claim that the degrading impact of feature resizing operations could hardly be removed by skip connections. By conducting large-scale ablation experiments and comparing the performance between models with and without the bottleneck design on a public CT lung dataset , we have discovered that AE models, including U-Net, have failed to achieve a compatible SISR result ($p<0.05$ by Student's t-test) compared to the baseline model. Our work is the first comparative study investigating the suitability of AE architecture for 3D CT SISR tasks and brings a rationale for researchers to re-think the choice of model architectures especially for 3D CT SISR tasks. The full implementation and trained models can be found at: https://github.com/Roldbach/Autoencoder-3D-CT-SISR

SYMar 8, 2019
Self-triggered Control for Safety Critical Systems using Control Barrier Functions

Guang Yang, Calin Belta, Roberto Tron

We propose a real-time control strategy that combines self-triggered control with Control Lyapunov Functions (CLF) and Control Barrier Functions (CBF). Similar to related works proposing CLF-CBF-based controllers, the computation of the controller is achieved by solving a Quadratic Program (QP). However, we propose a Zeroth-Order Hold (ZOH) implementation of the controller that overcomes the main limitations of traditional approaches based on periodic controllers, i.e., unnecessary controller updates and potential violations of the safety constraints. Central to our approach is the novel notion of safe period, which enforces a strong safety guarantee for implementing ZOH control. In addition, we prove that the system does not exhibit a Zeno behavior as it approaches the desired equilibrium.

MED-PHMar 31, 2023
Deep Learning-based Diffusion Tensor Cardiac Magnetic Resonance Reconstruction: A Comparison Study

Jiahao Huang, Pedro F. Ferreira, Lichao Wang et al.

In vivo cardiac diffusion tensor imaging (cDTI) is a promising Magnetic Resonance Imaging (MRI) technique for evaluating the micro-structure of myocardial tissue in the living heart, providing insights into cardiac function and enabling the development of innovative therapeutic strategies. However, the integration of cDTI into routine clinical practice is challenging due to the technical obstacles involved in the acquisition, such as low signal-to-noise ratio and long scanning times. In this paper, we investigate and implement three different types of deep learning-based MRI reconstruction models for cDTI reconstruction. We evaluate the performance of these models based on reconstruction quality assessment and diffusion tensor parameter assessment. Our results indicate that the models we discussed in this study can be applied for clinical use at an acceleration factor (AF) of $\times 2$ and $\times 4$, with the D5C5 model showing superior fidelity for reconstruction and the SwinMR model providing higher perceptual scores. There is no statistical difference with the reference for all diffusion tensor parameters at AF $\times 2$ or most DT parameters at AF $\times 4$, and the quality of most diffusion tensor parameter maps are visually acceptable. SwinMR is recommended as the optimal approach for reconstruction at AF $\times 2$ and AF $\times 4$. However, we believed the models discussed in this studies are not prepared for clinical use at a higher AF. At AF $\times 8$, the performance of all models discussed remains limited, with only half of the diffusion tensor parameters being recovered to a level with no statistical difference from the reference. Some diffusion tensor parameter maps even provide wrong and misleading information.

IVMar 21, 2022
ME-Net: Multi-Encoder Net Framework for Brain Tumor Segmentation

Wenbo Zhang, Guang Yang, He Huang et al.

Glioma is the most common and aggressive brain tumor. Magnetic resonance imaging (MRI) plays a vital role to evaluate tumors for the arrangement of tumor surgery and the treatment of subsequent procedures. However, the manual segmentation of the MRI image is strenuous, which limits its clinical application. With the development of deep learning, a large number of automatic segmentation methods have been developed, but most of them stay in 2D images, which leads to subpar performance. Moreover, the serious voxel imbalance between the brain tumor and the background as well as the different sizes and locations of the brain tumor makes the segmentation of 3D images a challenging problem. Aiming at segmenting 3D MRI, we propose a model for brain tumor segmentation with multiple encoders. The structure contains four encoders and one decoder. The four encoders correspond to the four modalities of the MRI image, perform one-to-one feature extraction, and then merge the feature maps of the four modalities into the decoder. This method reduces the difficulty of feature extraction and greatly improves model performance. We also introduced a new loss function named "Categorical Dice", and set different weights for different segmented regions at the same time, which solved the problem of voxel imbalance. We evaluated our approach using the online BraTS 2020 Challenge verification. Our proposed method can achieve promising results in the validation set compared to the state-of-the-art approaches with Dice scores of 0.70249, 0.88267, and 0.73864 for the intact tumor, tumor core, and enhanced tumor, respectively.

IVAug 11, 2022
Region-Based Evidential Deep Learning to Quantify Uncertainty and Improve Robustness of Brain Tumor Segmentation

Hao Li, Yang Nan, Javier Del Ser et al.

Despite recent advances in the accuracy of brain tumor segmentation, the results still suffer from low reliability and robustness. Uncertainty estimation is an efficient solution to this problem, as it provides a measure of confidence in the segmentation results. The current uncertainty estimation methods based on quantile regression, Bayesian neural network, ensemble, and Monte Carlo dropout are limited by their high computational cost and inconsistency. In order to overcome these challenges, Evidential Deep Learning (EDL) was developed in recent work but primarily for natural image classification. In this paper, we proposed a region-based EDL segmentation framework that can generate reliable uncertainty maps and robust segmentation results. We used the Theory of Evidence to interpret the output of a neural network as evidence values gathered from input features. Following Subjective Logic, evidence was parameterized as a Dirichlet distribution, and predicted probabilities were treated as subjective opinions. To evaluate the performance of our model on segmentation and uncertainty estimation, we conducted quantitative and qualitative experiments on the BraTS 2020 dataset. The results demonstrated the top performance of the proposed method in quantifying segmentation uncertainty and robustly segmenting tumors. Furthermore, our proposed new framework maintained the advantages of low computational cost and easy implementation and showed the potential for clinical application.

IVJul 19, 2022
Large-Kernel Attention for 3D Medical Image Segmentation

Hao Li, Yang Nan, Javier Del Ser et al.

Automatic segmentation of multiple organs and tumors from 3D medical images such as magnetic resonance imaging (MRI) and computed tomography (CT) scans using deep learning methods can aid in diagnosing and treating cancer. However, organs often overlap and are complexly connected, characterized by extensive anatomical variation and low contrast. In addition, the diversity of tumor shape, location, and appearance, coupled with the dominance of background voxels, makes accurate 3D medical image segmentation difficult. In this paper, a novel large-kernel (LK) attention module is proposed to address these problems to achieve accurate multi-organ segmentation and tumor segmentation. The advantages of convolution and self-attention are combined in the proposed LK attention module, including local contextual information, long-range dependence, and channel adaptation. The module also decomposes the LK convolution to optimize the computational cost and can be easily incorporated into FCNs such as U-Net. Comprehensive ablation experiments demonstrated the feasibility of convolutional decomposition and explored the most efficient and effective network design. Among them, the best Mid-type LK attention-based U-Net network was evaluated on CT-ORG and BraTS 2020 datasets, achieving state-of-the-art segmentation performance. The performance improvement due to the proposed LK attention module was also statistically validated.

CVJul 8, 2024Code
HilbertMamba: Local-Global Reciprocal Network for Uterine Fibroid Segmentation in Ultrasound Videos

Huihui Xu, Yijun Yang, Angelica I Aviles-Rivero et al.

Regular screening and early discovery of uterine fibroid are crucial for preventing potential malignant transformations and ensuring timely, life-saving interventions. To this end, we collect and annotate the first ultrasound video dataset with 100 videos for uterine fibroid segmentation (UFUV). We also present Local-Global Reciprocal Network (LGRNet) to efficiently and effectively propagate the long-term temporal context which is crucial to help distinguish between uninformative noisy surrounding tissues and target lesion regions. Specifically, the Cyclic Neighborhood Propagation (CNP) is introduced to propagate the inter-frame local temporal context in a cyclic manner. Moreover, to aggregate global temporal context, we first condense each frame into a set of frame bottleneck queries and devise Hilbert Selective Scan (HilbertSS) to both efficiently path connect each frame and preserve the locality bias. A distribute layer is then utilized to disseminate back the global context for reciprocal refinement. Extensive experiments on UFUV and three public Video Polyp Segmentation (VPS) datasets demonstrate consistent improvements compared to state-of-the-art segmentation methods, indicating the effectiveness and versatility of LGRNet. Code, checkpoints, and dataset are available at https://github.com/bio-mlhui/LGRNet

CVSep 6, 2023
DMKD: Improving Feature-based Knowledge Distillation for Object Detection Via Dual Masking Augmentation

Guang Yang, Yin Tang, Zhijian Wu et al.

Recent mainstream masked distillation methods function by reconstructing selectively masked areas of a student network from the feature map of its teacher counterpart. In these methods, the masked regions need to be properly selected, such that reconstructed features encode sufficient discrimination and representation capability like the teacher feature. However, previous masked distillation methods only focus on spatial masking, making the resulting masked areas biased towards spatial importance without encoding informative channel clues. In this study, we devise a Dual Masked Knowledge Distillation (DMKD) framework which can capture both spatially important and channel-wise informative clues for comprehensive masked feature reconstruction. More specifically, we employ dual attention mechanism for guiding the respective masking branches, leading to reconstructed feature encoding dual significance. Furthermore, fusing the reconstructed features is achieved by self-adjustable weighting strategy for effective feature distillation. Our experiments on object detection task demonstrate that the student networks achieve performance gains of 4.1% and 4.3% with the help of our method when RetinaNet and Cascade Mask R-CNN are respectively used as the teacher networks, while outperforming the other state-of-the-art distillation methods.

IVJul 12, 2022
Human Treelike Tubular Structure Segmentation: A Comprehensive Review and Future Perspectives

Hao Li, Zeyu Tang, Yang Nan et al.

Various structures in human physiology follow a treelike morphology, which often expresses complexity at very fine scales. Examples of such structures are intrathoracic airways, retinal blood vessels, and hepatic blood vessels. Large collections of 2D and 3D images have been made available by medical imaging modalities such as magnetic resonance imaging (MRI), computed tomography (CT), Optical coherence tomography (OCT) and ultrasound in which the spatial arrangement can be observed. Segmentation of these structures in medical imaging is of great importance since the analysis of the structure provides insights into disease diagnosis, treatment planning, and prognosis. Manually labelling extensive data by radiologists is often time-consuming and error-prone. As a result, automated or semi-automated computational models have become a popular research field of medical imaging in the past two decades, and many have been developed to date. In this survey, we aim to provide a comprehensive review of currently publicly available datasets, segmentation algorithms, and evaluation metrics. In addition, current challenges and future research directions are discussed.

IVMar 9, 2022
HDL: Hybrid Deep Learning for the Synthesis of Myocardial Velocity Maps in Digital Twins for Cardiac Analysis

Xiaodan Xing, Javier Del Ser, Yinzhe Wu et al.

Synthetic digital twins based on medical data accelerate the acquisition, labelling and decision making procedure in digital healthcare. A core part of digital healthcare twins is model-based data synthesis, which permits the generation of realistic medical signals without requiring to cope with the modelling complexity of anatomical and biochemical phenomena producing them in reality. Unfortunately, algorithms for cardiac data synthesis have been so far scarcely studied in the literature. An important imaging modality in the cardiac examination is three-directional CINE multi-slice myocardial velocity mapping (3Dir MVM), which provides a quantitative assessment of cardiac motion in three orthogonal directions of the left ventricle. The long acquisition time and complex acquisition produce make it more urgent to produce synthetic digital twins of this imaging modality. In this study, we propose a hybrid deep learning (HDL) network, especially for synthetic 3Dir MVM data. Our algorithm is featured by a hybrid UNet and a Generative Adversarial Network with a foreground-background generation scheme. The experimental results show that from temporally down-sampled magnitude CINE images (six times), our proposed algorithm can still successfully synthesise high temporal resolution 3Dir MVM CMR data (PSNR=42.32) with precise left ventricle segmentation (DICE=0.92). These performance scores indicate that our proposed HDL algorithm can be implemented in real-world digital twins for myocardial velocity mapping data simulation. To the best of our knowledge, this work is the first one in the literature investigating digital twins of the 3Dir MVM CMR, which has shown great potential for improving the efficiency of clinical studies via synthesised cardiac data.

IVNov 2, 2023Code
Dynamic Multimodal Information Bottleneck for Multimodality Classification

Yingying Fang, Shuang Wu, Sheng Zhang et al.

Effectively leveraging multimodal data such as various images, laboratory tests and clinical information is gaining traction in a variety of AI-based medical diagnosis and prognosis tasks. Most existing multi-modal techniques only focus on enhancing their performance by leveraging the differences or shared features from various modalities and fusing feature across different modalities. These approaches are generally not optimal for clinical settings, which pose the additional challenges of limited training data, as well as being rife with redundant data or noisy modality channels, leading to subpar performance. To address this gap, we study the robustness of existing methods to data redundancy and noise and propose a generalized dynamic multimodal information bottleneck framework for attaining a robust fused feature representation. Specifically, our information bottleneck module serves to filter out the task-irrelevant information and noises in the fused feature, and we further introduce a sufficiency loss to prevent dropping of task-relevant information, thus explicitly preserving the sufficiency of prediction information in the distilled feature. We validate our model on an in-house and a public COVID19 dataset for mortality prediction as well as two public biomedical datasets for diagnostic tasks. Extensive experiments show that our method surpasses the state-of-the-art and is significantly more robust, being the only method to remain performance when large-scale noisy channels exist. Our code is publicly available at https://github.com/ayanglab/DMIB.

CVSep 19, 2023
CMRxRecon: An open cardiac MRI dataset for the competition of accelerated image reconstruction

Chengyan Wang, Jun Lyu, Shuo Wang et al.

Cardiac magnetic resonance imaging (CMR) has emerged as a valuable diagnostic tool for cardiac diseases. However, a limitation of CMR is its slow imaging speed, which causes patient discomfort and introduces artifacts in the images. There has been growing interest in deep learning-based CMR imaging algorithms that can reconstruct high-quality images from highly under-sampled k-space data. However, the development of deep learning methods requires large training datasets, which have not been publicly available for CMR. To address this gap, we released a dataset that includes multi-contrast, multi-view, multi-slice and multi-coil CMR imaging data from 300 subjects. Imaging studies include cardiac cine and mapping sequences. Manual segmentations of the myocardium and chambers of all the subjects are also provided within the dataset. Scripts of state-of-the-art reconstruction algorithms were also provided as a point of reference. Our aim is to facilitate the advancement of state-of-the-art CMR image reconstruction by introducing standardized evaluation criteria and making the dataset freely accessible to the research community. Researchers can access the dataset at https://www.synapse.org/#!Synapse:syn51471091/wiki/.

IVNov 11, 2023Code
Stain Consistency Learning: Handling Stain Variation for Automatic Digital Pathology Segmentation

Michael Yeung, Todd Watts, Sean YW Tan et al.

Stain variation is a unique challenge associated with automated analysis of digital pathology. Numerous methods have been developed to improve the robustness of machine learning methods to stain variation, but comparative studies have demonstrated limited benefits to performance. Moreover, methods to handle stain variation were largely developed for H&E stained data, with evaluation generally limited to classification tasks. Here we propose Stain Consistency Learning, a novel framework combining stain-specific augmentation with a stain consistency loss function to learn stain colour invariant features. We perform the first, extensive comparison of methods to handle stain variation for segmentation tasks, comparing ten methods on Masson's trichrome and H&E stained cell and nuclei datasets, respectively. We observed that stain normalisation methods resulted in equivalent or worse performance, while stain augmentation or stain adversarial methods demonstrated improved performance, with the best performance consistently achieved by our proposed approach. The code is available at: https://github.com/mlyg/stain_consistency_learning

IVApr 1, 2022
Data and Physics Driven Learning Models for Fast MRI -- Fundamentals and Methodologies from CNN, GAN to Attention and Transformers

Jiahao Huang, Yingying Fang, Yang Nan et al.

Research studies have shown no qualms about using data driven deep learning models for downstream tasks in medical image analysis, e.g., anatomy segmentation and lesion detection, disease diagnosis and prognosis, and treatment planning. However, deep learning models are not the sovereign remedy for medical image analysis when the upstream imaging is not being conducted properly (with artefacts). This has been manifested in MRI studies, where the scanning is typically slow, prone to motion artefacts, with a relatively low signal to noise ratio, and poor spatial and/or temporal resolution. Recent studies have witnessed substantial growth in the development of deep learning techniques for propelling fast MRI. This article aims to (1) introduce the deep learning based data driven techniques for fast MRI including convolutional neural network and generative adversarial network based methods, (2) survey the attention and transformer based models for speeding up MRI reconstruction, and (3) detail the research in coupling physics and data driven models for MRI acceleration. Finally, we will demonstrate through a few clinical applications, explain the importance of data harmonisation and explainable models for such fast MRI techniques in multicentre and multi-scanner studies, and discuss common pitfalls in current research and recommendations for future research directions.

IVAug 25, 2022
A survey, review, and future trends of skin lesion segmentation and classification

Md. Kamrul Hasan, Md. Asif Ahamad, Choon Hwai Yap et al.

The Computer-aided Diagnosis or Detection (CAD) approach for skin lesion analysis is an emerging field of research that has the potential to alleviate the burden and cost of skin cancer screening. Researchers have recently indicated increasing interest in developing such CAD systems, with the intention of providing a user-friendly tool to dermatologists to reduce the challenges encountered or associated with manual inspection. This article aims to provide a comprehensive literature survey and review of a total of 594 publications (356 for skin lesion segmentation and 238 for skin lesion classification) published between 2011 and 2022. These articles are analyzed and summarized in a number of different ways to contribute vital information regarding the methods for the development of CAD systems. These ways include relevant and essential definitions and theories, input data (dataset utilization, preprocessing, augmentations, and fixing imbalance problems), method configuration (techniques, architectures, module frameworks, and losses), training tactics (hyperparameter settings), and evaluation criteria. We intend to investigate a variety of performance-enhancing approaches, including ensemble and post-processing. We also discuss these dimensions to reveal their current trends based on utilization frequencies. In addition, we highlight the primary difficulties associated with evaluating skin lesion segmentation and classification systems using minimal datasets, as well as the potential solutions to these difficulties. Findings, recommendations, and trends are disclosed to inform future research on developing an automated and robust CAD system for skin lesion analysis.

IVJul 25, 2023
One for Multiple: Physics-informed Synthetic Data Boosts Generalizable Deep Learning for Fast MRI Reconstruction

Zi Wang, Xiaotong Yu, Chengyan Wang et al.

Magnetic resonance imaging (MRI) is a widely used radiological modality renowned for its radiation-free, comprehensive insights into the human body, facilitating medical diagnoses. However, the drawback of prolonged scan times hinders its accessibility. The k-space undersampling offers a solution, yet the resultant artifacts necessitate meticulous removal during image reconstruction. Although Deep Learning (DL) has proven effective for fast MRI image reconstruction, its broader applicability across various imaging scenarios has been constrained. Challenges include the high cost and privacy restrictions associated with acquiring large-scale, diverse training data, coupled with the inherent difficulty of addressing mismatches between training and target data in existing DL methodologies. Here, we present a novel Physics-Informed Synthetic data learning framework for Fast MRI, called PISF. PISF marks a breakthrough by enabling generalized DL for multi-scenario MRI reconstruction through a single trained model. Our approach separates the reconstruction of a 2D image into many 1D basic problems, commencing with 1D data synthesis to facilitate generalization. We demonstrate that training DL models on synthetic data, coupled with enhanced learning techniques, yields in vivo MRI reconstructions comparable to or surpassing those of models trained on matched realistic datasets, reducing the reliance on real-world MRI data by up to 96%. Additionally, PISF exhibits remarkable generalizability across multiple vendors and imaging centers. Its adaptability to diverse patient populations has been validated through evaluations by ten experienced medical professionals. PISF presents a feasible and cost-effective way to significantly boost the widespread adoption of DL in various fast MRI applications.

MLOct 24, 2022
Deep Kronecker Network

Long Feng, Guang Yang

We propose Deep Kronecker Network (DKN), a novel framework designed for analyzing medical imaging data, such as MRI, fMRI, CT, etc. Medical imaging data is different from general images in at least two aspects: i) sample size is usually much more limited, ii) model interpretation is more of a concern compared to outcome prediction. Due to its unique nature, general methods, such as convolutional neural network (CNN), are difficult to be directly applied. As such, we propose DKN, that is able to i) adapt to low sample size limitation, ii) provide desired model interpretation, and iii) achieve the prediction power as CNN. The DKN is general in the sense that it not only works for both matrix and (high-order) tensor represented image data, but also could be applied to both discrete and continuous outcomes. The DKN is built on a Kronecker product structure and implicitly imposes a piecewise smooth property on coefficients. Moreover, the Kronecker structure can be written into a convolutional form, so DKN also resembles a CNN, particularly, a fully convolutional network (FCN). Furthermore, we prove that with an alternating minimization algorithm, the solutions of DKN are guaranteed to converge to the truth geometrically even if the objective function is highly nonconvex. Interestingly, the DKN is also highly connected to the tensor regression framework proposed by Zhou et al. (2010), where a CANDECOMP/PARAFAC (CP) low-rank structure is imposed on tensor coefficients. Finally, we conduct both classification and regression analyses using real MRI data from the Alzheimer's Disease Neuroimaging Initiative (ADNI) to demonstrate the effectiveness of DKN.

AIAug 20, 2024
Hokoff: Real Game Dataset from Honor of Kings and its Offline Reinforcement Learning Benchmarks

Yun Qu, Boyuan Wang, Jianzhun Shao et al. · tsinghua

The advancement of Offline Reinforcement Learning (RL) and Offline Multi-Agent Reinforcement Learning (MARL) critically depends on the availability of high-quality, pre-collected offline datasets that represent real-world complexities and practical applications. However, existing datasets often fall short in their simplicity and lack of realism. To address this gap, we propose Hokoff, a comprehensive set of pre-collected datasets that covers both offline RL and offline MARL, accompanied by a robust framework, to facilitate further research. This data is derived from Honor of Kings, a recognized Multiplayer Online Battle Arena (MOBA) game known for its intricate nature, closely resembling real-life situations. Utilizing this framework, we benchmark a variety of offline RL and offline MARL algorithms. We also introduce a novel baseline algorithm tailored for the inherent hierarchical action space of the game. We reveal the incompetency of current offline RL approaches in handling task complexity, generalization and multi-task learning.

IVSep 27, 2023Code
Style Transfer and Self-Supervised Learning Powered Myocardium Infarction Super-Resolution Segmentation

Lichao Wang, Jiahao Huang, Xiaodan Xing et al.

This study proposes a pipeline that incorporates a novel style transfer model and a simultaneous super-resolution and segmentation model. The proposed pipeline aims to enhance diffusion tensor imaging (DTI) images by translating them into the late gadolinium enhancement (LGE) domain, which offers a larger amount of data with high-resolution and distinct highlighting of myocardium infarction (MI) areas. Subsequently, the segmentation task is performed on the LGE style image. An end-to-end super-resolution segmentation model is introduced to generate high-resolution mask from low-resolution LGE style DTI image. Further, to enhance the performance of the model, a multi-task self-supervised learning strategy is employed to pre-train the super-resolution segmentation model, allowing it to acquire more representative knowledge and improve its segmentation performance after fine-tuning. https: github.com/wlc2424762917/Med_Img

IVJun 21, 2022
Faster Diffusion Cardiac MRI with Deep Learning-based breath hold reduction

Michael Tanzer, Pedro Ferreira, Andrew Scott et al.

Diffusion Tensor Cardiac Magnetic Resonance (DT-CMR) enables us to probe the microstructural arrangement of cardiomyocytes within the myocardium in vivo and non-invasively, which no other imaging modality allows. This innovative technology could revolutionise the ability to perform cardiac clinical diagnosis, risk stratification, prognosis and therapy follow-up. However, DT-CMR is currently inefficient with over six minutes needed to acquire a single 2D static image. Therefore, DT-CMR is currently confined to research but not used clinically. We propose to reduce the number of repetitions needed to produce DT-CMR datasets and subsequently de-noise them, decreasing the acquisition time by a linear factor while maintaining acceptable image quality. Our proposed approach, based on Generative Adversarial Networks, Vision Transformers, and Ensemble Learning, performs significantly and considerably better than previous proposed approaches, bringing single breath-hold DT-CMR closer to reality.

CVMay 21Code
GenHAR: Generalizing Cross-domain Human Activity Recognition for Last-mile Delivery

Zhiqing Hong, Zelong Li, Xiubin Fan et al.

Human Activity Recognition (HAR) has shown remarkable effectiveness in various applications, such as smart healthcare and intelligent manufacturing. However, a major challenge faced by HAR is the distribution shift across different sensor data domains, which often leads to decreased performance when deployed for real-world applications. To address this issue, this paper introduces GenHAR, a novel framework designed to mitigate the domain gap by learning domain-invariant sensor representations. GenHAR aims to enhance the generalization capabilities of HAR on target domains purely with data from the source domain. The key novelty of GenHAR lies in two aspects. Firstly, GenHAR tokenizes sensor data and learns correlations among frequency sensor channel dimensions to improve the robustness of HAR models. Secondly, GenHAR improves the efficiency via selective masking and an efficient attention mechanism. We conduct a systematic analysis of GenHAR by comparing it with state-of-the-art HAR methods on real-world human activity datasets. Results show that GenHAR outperforms state-of-the-art methods by 9.97% in accuracy, and reduces Floating Point Operations by 6.4 times. Moreover, we deploy GenHAR at a leading logistics company in 4 cities, and have detected 2.15 billion real-time activities. We release our code at: https://github.com/Sensor-FoundationModel/GenHAR.

IVJan 27, 2023
Hierarchical Perception Adversarial Learning Framework for Compressed Sensing MRI

Zhifan Gao, Yifeng Guo, Jiajing Zhang et al.

The long acquisition time has limited the accessibility of magnetic resonance imaging (MRI) because it leads to patient discomfort and motion artifacts. Although several MRI techniques have been proposed to reduce the acquisition time, compressed sensing in magnetic resonance imaging (CS-MRI) enables fast acquisition without compromising SNR and resolution. However, existing CS-MRI methods suffer from the challenge of aliasing artifacts. This challenge results in the noise-like textures and missing the fine details, thus leading to unsatisfactory reconstruction performance. To tackle this challenge, we propose a hierarchical perception adversarial learning framework (HP-ALF). HP-ALF can perceive the image information in the hierarchical mechanism: image-level perception and patch-level perception. The former can reduce the visual perception difference in the entire image, and thus achieve aliasing artifact removal. The latter can reduce this difference in the regions of the image, and thus recover fine details. Specifically, HP-ALF achieves the hierarchical mechanism by utilizing multilevel perspective discrimination. This discrimination can provide the information from two perspectives (overall and regional) for adversarial learning. It also utilizes a global and local coherent discriminator to provide structure information to the generator during training. In addition, HP-ALF contains a context-aware learning block to effectively exploit the slice information between individual images for better reconstruction performance. The experiments validated on three datasets demonstrate the effectiveness of HP-ALF and its superiority to the comparative methods.

IVSep 20, 2022
Review of data types and model dimensionality for cardiac DTI SMS-related artefact removal

Michael Tanzer, Sea Hee Yook, Guang Yang et al.

As diffusion tensor imaging (DTI) gains popularity in cardiac imaging due to its unique ability to non-invasively assess the cardiac microstructure, deep learning-based Artificial Intelligence is becoming a crucial tool in mitigating some of its drawbacks, such as the long scan times. As it often happens in fast-paced research environments, a lot of emphasis has been put on showing the capability of deep learning while often not enough time has been spent investigating what input and architectural properties would benefit cardiac DTI acceleration the most. In this work, we compare the effect of several input types (magnitude images vs complex images), multiple dimensionalities (2D vs 3D operations), and multiple input types (single slice vs multi-slice) on the performance of a model trained to remove artefacts caused by a simultaneous multi-slice (SMS) acquisition. Despite our initial intuition, our experiments show that, for a fixed number of parameters, simpler 2D real-valued models outperform their more advanced 3D or complex counterparts. The best performance is although obtained by a real-valued model trained using both the magnitude and phase components of the acquired data. We believe this behaviour to be due to real-valued models making better use of the lower number of parameters, and to 3D models not being able to exploit the spatial information because of the low SMS acceleration factor used in our experiments.

IVJan 23, 2023
ViGU: Vision GNN U-Net for Fast MRI

Jiahao Huang, Angelica Aviles-Rivero, Carola-Bibiane Schonlieb et al.

Deep learning models have been widely applied for fast MRI. The majority of existing deep learning models, e.g., convolutional neural networks, work on data with Euclidean or regular grids structures. However, high-dimensional features extracted from MR data could be encapsulated in non-Euclidean manifolds. This disparity between the go-to assumption of existing models and data requirements limits the flexibility to capture irregular anatomical features in MR data. In this work, we introduce a novel Vision GNN type network for fast MRI called Vision GNN U-Net (ViGU). More precisely, the pixel array is first embedded into patches and then converted into a graph. Secondly, a U-shape network is developed using several graph blocks in symmetrical encoder and decoder paths. Moreover, we show that the proposed ViGU can also benefit from Generative Adversarial Networks yielding to its variant ViGU-GAN. We demonstrate, through numerical and visual experiments, that the proposed ViGU and GAN variant outperform existing CNN and GAN-based methods. Moreover, we show that the proposed network readily competes with approaches based on Transformers while requiring a fraction of the computational cost. More importantly, the graph structure of the network reveals how the network extracts features from MR images, providing intuitive explainability.

IVMar 11, 2022
Automatic Fine-grained Glomerular Lesion Recognition in Kidney Pathology

Yang Nan, Fengyi Li, Peng Tang et al.

Recognition of glomeruli lesions is the key for diagnosis and treatment planning in kidney pathology; however, the coexisting glomerular structures such as mesangial regions exacerbate the difficulties of this task. In this paper, we introduce a scheme to recognize fine-grained glomeruli lesions from whole slide images. First, a focal instance structural similarity loss is proposed to drive the model to locate all types of glomeruli precisely. Then an Uncertainty Aided Apportionment Network is designed to carry out the fine-grained visual classification without bounding-box annotations. This double branch-shaped structure extracts common features of the child class from the parent class and produces the uncertainty factor for reconstituting the training dataset. Results of slide-wise evaluation illustrate the effectiveness of the entire scheme, with an 8-22% improvement of the mean Average Precision compared with remarkable detection methods. The comprehensive results clearly demonstrate the effectiveness of the proposed method.

CVJul 11, 2023
$\mathrm{SAM^{Med}}$: A medical image annotation framework based on large vision model

Chenglong Wang, Dexuan Li, Sucheng Wang et al.

Recently, large vision model, Segment Anything Model (SAM), has revolutionized the computer vision field, especially for image segmentation. SAM presented a new promptable segmentation paradigm that exhibit its remarkable zero-shot generalization ability. An extensive researches have explore the potential and limits of SAM in various downstream tasks. In this study, we presents $\mathrm{SAM^{Med}}$, an enhanced framework for medical image annotation that leverages the capabilities of SAM. $\mathrm{SAM^{Med}}$ framework consisted of two submodules, namely $\mathrm{SAM^{assist}}$ and $\mathrm{SAM^{auto}}$. The $\mathrm{SAM^{assist}}$ demonstrates the generalization ability of SAM to the downstream medical segmentation task using the prompt-learning approach. Results show a significant improvement in segmentation accuracy with only approximately 5 input points. The $\mathrm{SAM^{auto}}$ model aims to accelerate the annotation process by automatically generating input prompts. The proposed SAP-Net model achieves superior segmentation performance with only five annotated slices, achieving an average Dice coefficient of 0.80 and 0.82 for kidney and liver segmentation, respectively. Overall, $\mathrm{SAM^{Med}}$ demonstrates promising results in medical image annotation. These findings highlight the potential of leveraging large-scale vision models in medical image annotation tasks.

SYMar 9, 2019
Continuous-time Signal Temporal Logic Planning with Control Barrier Function

Guang Yang, Roberto Tron, Calin Belta

Temporal Logic (TL) guided control problems have gained interests in recent years. By using the TL, one can specify a wide range of temporal constraints on the system and is widely used in cyber-physical systems. On the other hand, Control Barrier Functions have also gained interests in the context of safety critical applications. However, most of the existing approaches only focus on discrete-time dynamical systems. In this paper, we propose an offline trajectory planner for linear systems subject to safety and temporal specifications. Such specifications can be expressed as logical junctions or disjunctions of linear CBFs, or as STL specifications with linear predicates. Our planner produces trajectories that are valid in continuous time, while assuming only discrete-time control updates and arbitrary time interval in the STL formula. Our planner is based on a Mixed Integer Quadratic Programming (MIQP) formulation, where the linear STL predicates are encoded as set of linear constraints to guarantee satisfaction at on a finite discrete set of time instants, while we use CBFs to derive constraints that guarantee continuous satisfaction between time instants. Moreover, we have shown the predicates can be encoded as time-based CBF constraints for system with any relative degrees. We validate our theoretical results and formulation through numerical simulations.

LGJun 14, 2023
A Simple Data Augmentation for Feature Distribution Skewed Federated Learning

Yunlu Yan, Huazhu Fu, Yuexiang Li et al.

Federated Learning (FL) facilitates collaborative learning among multiple clients in a distributed manner and ensures the security of privacy. However, its performance inevitably degrades with non-Independent and Identically Distributed (non-IID) data. In this paper, we focus on the feature distribution skewed FL scenario, a common non-IID situation in real-world applications where data from different clients exhibit varying underlying distributions. This variation leads to feature shift, which is a key issue of this scenario. While previous works have made notable progress, few pay attention to the data itself, i.e., the root of this issue. The primary goal of this paper is to mitigate feature shift from the perspective of data. To this end, we propose a simple yet remarkably effective input-level data augmentation method, namely FedRDN, which randomly injects the statistical information of the local distribution from the entire federation into the client's data. This is beneficial to improve the generalization of local feature representations, thereby mitigating feature shift. Moreover, our FedRDN is a plug-and-play component, which can be seamlessly integrated into the data augmentation flow with only a few lines of code. Extensive experiments on several datasets show that the performance of various representative FL methods can be further improved by integrating our FedRDN, demonstrating its effectiveness, strong compatibility and generalizability. Code will be released.

IVOct 21, 2022
Adversarial Transformer for Repairing Human Airway Segmentation

Zeyu Tang, Nan Yang, Simon Walsh et al.

Discontinuity in the delineation of peripheral bronchioles hinders the potential clinical application of automated airway segmentation models. Moreover, the deployment of such models is limited by the data heterogeneity across different centres, and pathological abnormalities also make achieving accurate robust segmentation in distal small airways difficult. Meanwhile, the diagnosis and prognosis of lung diseases often rely on evaluating structural changes in those anatomical regions. To address this gap, this paper presents a patch-scale adversarial-based refinement network that takes in preliminary segmentation along with original CT images and outputs a refined mask of the airway structure. The method is validated on three different datasets encompassing healthy cases, cases with cystic fibrosis and cases with COVID-19. The results are quantitatively evaluated by seven metrics and achieved more than a 15% rise in detected length ratio and detected branch ratio, showing promising performance compared to previously proposed models. The visual illustration also proves our refinement guided by a patch-scale discriminator and centreline objective functions is effective in detecting discontinuities and missing bronchioles. Furthermore, the generalizability of our refinement pipeline is tested on three previous models and improves their segmentation completeness significantly.

IVAug 4, 2022
Unsupervised Tissue Segmentation via Deep Constrained Gaussian Network

Yang Nan, Peng Tang, Guyue Zhang et al.

Tissue segmentation is the mainstay of pathological examination, whereas the manual delineation is unduly burdensome. To assist this time-consuming and subjective manual step, researchers have devised methods to automatically segment structures in pathological images. Recently, automated machine and deep learning based methods dominate tissue segmentation research studies. However, most machine and deep learning based approaches are supervised and developed using a large number of training samples, in which the pixelwise annotations are expensive and sometimes can be impossible to obtain. This paper introduces a novel unsupervised learning paradigm by integrating an end-to-end deep mixture model with a constrained indicator to acquire accurate semantic tissue segmentation. This constraint aims to centralise the components of deep mixture models during the calculation of the optimisation function. In so doing, the redundant or empty class issues, which are common in current unsupervised learning methods, can be greatly reduced. By validation on both public and in-house datasets, the proposed deep constrained Gaussian network achieves significantly (Wilcoxon signed-rank test) better performance (with the average Dice scores of 0.737 and 0.735, respectively) on tissue segmentation with improved stability and robustness, compared to other existing unsupervised segmentation approaches. Furthermore, the proposed method presents a similar performance (p-value > 0.05) compared to the fully supervised U-Net.

CVAug 16, 2024Code
Beyond the Hype: A dispassionate look at vision-language models in medical scenario

Yang Nan, Huichi Zhou, Xiaodan Xing et al.

Recent advancements in Large Vision-Language Models (LVLMs) have demonstrated remarkable capabilities across diverse tasks, garnering significant attention in AI communities. However, their performance and reliability in specialized domains such as medicine remain insufficiently assessed. In particular, most assessments over-concentrate on evaluating VLMs based on simple Visual Question Answering (VQA) on multi-modality data, while ignoring the in-depth characteristics of LVLMs. In this study, we introduce RadVUQA, a novel Radiological Visual Understanding and Question Answering benchmark, to comprehensively evaluate existing LVLMs. RadVUQA mainly validates LVLMs across five dimensions: 1) Anatomical understanding, assessing the models' ability to visually identify biological structures; 2) Multimodal comprehension, which involves the capability of interpreting linguistic and visual instructions to produce desired outcomes; 3) Quantitative and spatial reasoning, evaluating the models' spatial awareness and proficiency in combining quantitative analysis with visual and linguistic information; 4) Physiological knowledge, measuring the models' capability to comprehend functions and mechanisms of organs and systems; and 5) Robustness, which assesses the models' capabilities against unharmonized and synthetic data. The results indicate that both generalized LVLMs and medical-specific LVLMs have critical deficiencies with weak multimodal comprehension and quantitative reasoning capabilities. Our findings reveal the large gap between existing LVLMs and clinicians, highlighting the urgent need for more robust and intelligent LVLMs. The code is available at https://github.com/Nandayang/RadVUQA

CVJan 23Code
StealthMark: Harmless and Stealthy Ownership Verification for Medical Segmentation via Uncertainty-Guided Backdoors

Qinkai Yu, Chong Zhang, Gaojie Jin et al.

Annotating medical data for training AI models is often costly and limited due to the shortage of specialists with relevant clinical expertise. This challenge is further compounded by privacy and ethical concerns associated with sensitive patient information. As a result, well-trained medical segmentation models on private datasets constitute valuable intellectual property requiring robust protection mechanisms. Existing model protection techniques primarily focus on classification and generative tasks, while segmentation models-crucial to medical image analysis-remain largely underexplored. In this paper, we propose a novel, stealthy, and harmless method, StealthMark, for verifying the ownership of medical segmentation models under black-box conditions. Our approach subtly modulates model uncertainty without altering the final segmentation outputs, thereby preserving the model's performance. To enable ownership verification, we incorporate model-agnostic explanation methods, e.g. LIME, to extract feature attributions from the model outputs. Under specific triggering conditions, these explanations reveal a distinct and verifiable watermark. We further design the watermark as a QR code to facilitate robust and recognizable ownership claims. We conducted extensive experiments across four medical imaging datasets and five mainstream segmentation models. The results demonstrate the effectiveness, stealthiness, and harmlessness of our method on the original model's segmentation performance. For example, when applied to the SAM model, StealthMark consistently achieved ASR above 95% across various datasets while maintaining less than a 1% drop in Dice and AUC scores, significantly outperforming backdoor-based watermarking methods and highlighting its strong potential for practical deployment. Our implementation code is made available at: https://github.com/Qinkaiyu/StealthMark.

IVMay 29
MoE-dqINR: A Unified Mixture-of-Experts Implicit Neural Representation Framework for Scan-Specific Dynamic and Quantitative MRI Reconstruction

Yinzhe Wu, Fanwen Wang, Zhenxuan Zhang et al.

Undersampled magnetic resonance imaging (MRI) reconstruction seeks to recover temporally or contrast-varying image series from incomplete multicoil k-space data while preserving state-dependent fidelity for dynamic and quantitative MRI (qMRI). Existing scan-specific implicit neural representations (INRs) often use monolithic spatiotemporal coordinate fields, explicit subspaces, motion or deformation models, calibration variables, or sequence-specific quantitative signal models. These design choices can limit flexibility in sharing spatial information while adapting image synthesis across acquisition states. Moreover, many INR-based baselines remain computationally demanding, typically requiring per-scan optimization times on the order of hundreds to thousands of seconds. We propose MoE-dqINR, a scan-specific multicoil MRI reconstruction framework that factorizes the image-domain representation into shared spatial experts and a state-conditioned routing pathway. Spatial experts encode reusable coordinate-dependent image content, whereas routing weights, conditioned on ordered acquisition states, synthesize each dynamic frame or contrast state from a common expert bank. The representation is coupled to a multicoil MRI forward model, uses the normalized state index to drive routing in both dynamic and quantitative MRI. By separating shared spatial representation from state-dependent synthesis, the framework provides an image-first architecture for dynamic and quantitative MRI while reducing scan-specific INR optimization to approximately 30 s per scan in our experiments. The proposed formulation establishes state-conditioned mixture-of-experts INR as a scan-specific multicoil MRI reconstruction prior that unifies shared spatial representation, dynamic- and qMRI-specific synthesis, and practical per-scan efficiency.

LGAug 8, 2024
Advancing oncology with federated learning: transcending boundaries in breast, lung, and prostate cancer. A systematic review

Anshu Ankolekar, Sebastian Boie, Maryam Abdollahyan et al.

Federated Learning (FL) has emerged as a promising solution to address the limitations of centralised machine learning (ML) in oncology, particularly in overcoming privacy concerns and harnessing the power of diverse, multi-center data. This systematic review synthesises current knowledge on the state-of-the-art FL in oncology, focusing on breast, lung, and prostate cancer. Distinct from previous surveys, our comprehensive review critically evaluates the real-world implementation and impact of FL on cancer care, demonstrating its effectiveness in enhancing ML generalisability, performance and data privacy in clinical settings and data. We evaluated state-of-the-art advances in FL, demonstrating its growing adoption amid tightening data privacy regulations. FL outperformed centralised ML in 15 out of the 25 studies reviewed, spanning diverse ML models and clinical applications, and facilitating integration of multi-modal information for precision medicine. Despite the current challenges identified in reproducibility, standardisation and methodology across studies, the demonstrable benefits of FL in harnessing real-world data and addressing clinical needs highlight its significant potential for advancing cancer research. We propose that future research should focus on addressing these limitations and investigating further advanced FL methods, to fully harness data diversity and realise the transformative power of cutting-edge FL in cancer care.

CVJan 31, 2023
AMD: Adaptive Masked Distillation for Object Detection

Guang Yang, Yin Tang, Jun Li et al.

As a general model compression paradigm, feature-based knowledge distillation allows the student model to learn expressive features from the teacher counterpart. In this paper, we mainly focus on designing an effective feature-distillation framework and propose a spatial-channel adaptive masked distillation (AMD) network for object detection. More specifically, in order to accurately reconstruct important feature regions, we first perform attention-guided feature masking on the feature map of the student network, such that we can identify the important features via spatially adaptive feature masking instead of random masking in the previous methods. In addition, we employ a simple and efficient module to allow the student network channel to be adaptive, improving its model capability in object perception and detection. In contrast to the previous methods, more crucial object-aware features can be reconstructed and learned from the proposed network, which is conducive to accurate object detection. The empirical experiments demonstrate the superiority of our method: with the help of our proposed distillation method, the student networks report 41.3%, 42.4%, and 42.7% mAP scores when RetinaNet, Cascade Mask-RCNN and RepPoints are respectively used as the teacher framework for object detection, which outperforms the previous state-of-the-art distillation methods including FGD and MGD.

CVMar 17, 2022
DRAG: Dynamic Region-Aware GCN for Privacy-Leaking Image Detection

Guang Yang, Juan Cao, Qiang Sheng et al.

The daily practice of sharing images on social media raises a severe issue about privacy leakage. To address the issue, privacy-leaking image detection is studied recently, with the goal to automatically identify images that may leak privacy. Recent advance on this task benefits from focusing on crucial objects via pretrained object detectors and modeling their correlation. However, these methods have two limitations: 1) they neglect other important elements like scenes, textures, and objects beyond the capacity of pretrained object detectors; 2) the correlation among objects is fixed, but a fixed correlation is not appropriate for all the images. To overcome the limitations, we propose the Dynamic Region-Aware Graph Convolutional Network (DRAG) that dynamically finds out crucial regions including objects and other important elements, and models their correlation adaptively for each input image. To find out crucial regions, we cluster spatially-correlated feature channels into several region-aware feature maps. Further, we dynamically model the correlation with the self-attention mechanism and explore the interaction among the regions with a graph convolutional network. The DRAG achieved an accuracy of 87% on the largest dataset for privacy-leaking image detection, which is 10 percentage points higher than the state of the art. The further case study demonstrates that it found out crucial regions containing not only objects but other important elements like textures.

CVJul 24, 2023
Is attention all you need in medical image analysis? A review

Giorgos Papanastasiou, Nikolaos Dikaios, Jiahao Huang et al.

Medical imaging is a key component in clinical diagnosis, treatment planning and clinical trial design, accounting for almost 90% of all healthcare data. CNNs achieved performance gains in medical image analysis (MIA) over the last years. CNNs can efficiently model local pixel interactions and be trained on small-scale MI data. The main disadvantage of typical CNN models is that they ignore global pixel relationships within images, which limits their generalisation ability to understand out-of-distribution data with different 'global' information. The recent progress of Artificial Intelligence gave rise to Transformers, which can learn global relationships from data. However, full Transformer models need to be trained on large-scale data and involve tremendous computational complexity. Attention and Transformer compartments (Transf/Attention) which can well maintain properties for modelling global relationships, have been proposed as lighter alternatives of full Transformers. Recently, there is an increasing trend to co-pollinate complementary local-global properties from CNN and Transf/Attention architectures, which led to a new era of hybrid models. The past years have witnessed substantial growth in hybrid CNN-Transf/Attention models across diverse MIA problems. In this systematic review, we survey existing hybrid CNN-Transf/Attention models, review and unravel key architectural designs, analyse breakthroughs, and evaluate current and future opportunities as well as challenges. We also introduced a comprehensive analysis framework on generalisation opportunities of scientific and clinical impact, based on which new data-driven domain generalisation and adaptation methods can be stimulated.

LGJul 9, 2022
Explainable AI (XAI) in Biomedical Signal and Image Processing: Promises and Challenges

Guang Yang, Arvind Rao, Christine Fernandez-Maloigne et al.

Artificial intelligence has become pervasive across disciplines and fields, and biomedical image and signal processing is no exception. The growing and widespread interest on the topic has triggered a vast research activity that is reflected in an exponential research effort. Through study of massive and diverse biomedical data, machine and deep learning models have revolutionized various tasks such as modeling, segmentation, registration, classification and synthesis, outperforming traditional techniques. However, the difficulty in translating the results into biologically/clinically interpretable information is preventing their full exploitation in the field. Explainable AI (XAI) attempts to fill this translational gap by providing means to make the models interpretable and providing explanations. Different solutions have been proposed so far and are gaining increasing interest from the community. This paper aims at providing an overview on XAI in biomedical data processing and points to an upcoming Special Issue on Deep Learning in Biomedical Image and Signal Processing of the IEEE Signal Processing Magazine that is going to appear in March 2022.

IVJul 3, 2024
Probing Perfection: The Relentless Art of Meddling for Pulmonary Airway Segmentation from HRCT via a Human-AI Collaboration Based Active Learning Method

Shiyi Wang, Yang Nan, Sheng Zhang et al.

In pulmonary tracheal segmentation, the scarcity of annotated data is a prevalent issue in medical segmentation. Additionally, Deep Learning (DL) methods face challenges: the opacity of 'black box' models and the need for performance enhancement. Our Human-Computer Interaction (HCI) based models (RS_UNet, LC_UNet, UUNet, and WD_UNet) address these challenges by combining diverse query strategies with various DL models. We train four HCI models and repeat these steps: (1) Query Strategy: The HCI models select samples that provide the most additional representative information when labeled in each iteration and identify unlabeled samples with the greatest predictive disparity using Wasserstein Distance, Least Confidence, Entropy Sampling, and Random Sampling. (2) Central line correction: Selected samples are used for expert correction of system-generated tracheal central lines in each training round. (3) Update training dataset: Experts update the training dataset after each DL model's training epoch, enhancing the trustworthiness and performance of the models. (4) Model training: The HCI model is trained using the updated dataset and an enhanced UNet version. Experimental results confirm the effectiveness of these HCI-based approaches, showing that WD-UNet, LC-UNet, UUNet, and RS-UNet achieve comparable or superior performance to state-of-the-art DL models. Notably, WD-UNet achieves this with only 15%-35% of the training data, reducing physician annotation time by 65%-85%.