CVApr 22, 2022
Dynamic Prototype Convolution Network for Few-Shot Semantic SegmentationJie Liu, Yanqi Bao, Guo-Sen Xie et al.
The key challenge for few-shot semantic segmentation (FSS) is how to tailor a desirable interaction among support and query features and/or their prototypes, under the episodic training scenario. Most existing FSS methods implement such support-query interactions by solely leveraging plain operations - e.g., cosine similarity and feature concatenation - for segmenting the query objects. However, these interaction approaches usually cannot well capture the intrinsic object details in the query images that are widely encountered in FSS, e.g., if the query object to be segmented has holes and slots, inaccurate segmentation almost always happens. To this end, we propose a dynamic prototype convolution network (DPCN) to fully capture the aforementioned intrinsic details for accurate FSS. Specifically, in DPCN, a dynamic convolution module (DCM) is firstly proposed to generate dynamic kernels from support foreground, then information interaction is achieved by convolution operations over query features using these kernels. Moreover, we equip DPCN with a support activation module (SAM) and a feature filtering module (FFM) to generate pseudo mask and filter out background information for the query images, respectively. SAM and FFM together can mine enriched context information from the query features. Our DPCN is also flexible and efficient under the k-shot FSS setting. Extensive experiments on PASCAL-5i and COCO-20i show that DPCN yields superior performances under both 1-shot and 5-shot settings.
CVJan 25, 2023
Modelling Long Range Dependencies in $N$D: From Task-Specific to a General Purpose CNNDavid M. Knigge, David W. Romero, Albert Gu et al.
Performant Convolutional Neural Network (CNN) architectures must be tailored to specific tasks in order to consider the length, resolution, and dimensionality of the input data. In this work, we tackle the need for problem-specific CNN architectures. We present the Continuous Convolutional Neural Network (CCNN): a single CNN able to process data of arbitrary resolution, dimensionality and length without any structural changes. Its key component are its continuous convolutional kernels which model long-range dependencies at every layer, and thus remove the need of current CNN architectures for task-dependent downsampling and depths. We showcase the generality of our method by using the same architecture for tasks on sequential ($1{\rm D}$), visual ($2{\rm D}$) and point-cloud ($3{\rm D}$) data. Our CCNN matches and often outperforms the current state-of-the-art across all tasks considered.
IVJan 20, 2023
On Retrospective k-space Subsampling schemes For Deep MRI ReconstructionGeorge Yiasemis, Clara I. Sánchez, Jan-Jakob Sonke et al.
Acquiring fully-sampled MRI $k$-space data is time-consuming, and collecting accelerated data can reduce the acquisition time. Employing 2D Cartesian-rectilinear subsampling schemes is a conventional approach for accelerated acquisitions; however, this often results in imprecise reconstructions, even with the use of Deep Learning (DL), especially at high acceleration factors. Non-rectilinear or non-Cartesian trajectories can be implemented in MRI scanners as alternative subsampling options. This work investigates the impact of the $k$-space subsampling scheme on the quality of reconstructed accelerated MRI measurements produced by trained DL models. The Recurrent Variational Network (RecurrentVarNet) was used as the DL-based MRI-reconstruction architecture. Cartesian, fully-sampled multi-coil $k$-space measurements from three datasets were retrospectively subsampled with different accelerations using eight distinct subsampling schemes: four Cartesian-rectilinear, two Cartesian non-rectilinear, and two non-Cartesian. Experiments were conducted in two frameworks: scheme-specific, where a distinct model was trained and evaluated for each dataset-subsampling scheme pair, and multi-scheme, where for each dataset a single model was trained on data randomly subsampled by any of the eight schemes and evaluated on data subsampled by all schemes. In both frameworks, RecurrentVarNets trained and evaluated on non-rectilinearly subsampled data demonstrated superior performance, particularly for high accelerations. In the multi-scheme setting, reconstruction performance on rectilinearly subsampled data improved when compared to the scheme-specific experiments. Our findings demonstrate the potential for using DL-based methods, trained on non-rectilinearly subsampled measurements, to optimize scan time and image quality.
IVSep 18, 2023
vSHARP: variable Splitting Half-quadratic Admm algorithm for Reconstruction of inverse-ProblemsGeorge Yiasemis, Nikita Moriakov, Jan-Jakob Sonke et al.
Medical Imaging (MI) tasks, such as accelerated parallel Magnetic Resonance Imaging (MRI), often involve reconstructing an image from noisy or incomplete measurements. This amounts to solving ill-posed inverse problems, where a satisfactory closed-form analytical solution is not available. Traditional methods such as Compressed Sensing (CS) in MRI reconstruction can be time-consuming or prone to obtaining low-fidelity images. Recently, a plethora of Deep Learning (DL) approaches have demonstrated superior performance in inverse-problem solving, surpassing conventional methods. In this study, we propose vSHARP (variable Splitting Half-quadratic ADMM algorithm for Reconstruction of inverse Problems), a novel DL-based method for solving ill-posed inverse problems arising in MI. vSHARP utilizes the Half-Quadratic Variable Splitting method and employs the Alternating Direction Method of Multipliers (ADMM) to unroll the optimization process. For data consistency, vSHARP unrolls a differentiable gradient descent process in the image domain, while a DL-based denoiser, such as a U-Net architecture, is applied to enhance image quality. vSHARP also employs a dilated-convolution DL-based model to predict the Lagrange multipliers for the ADMM initialization. We evaluate vSHARP on tasks of accelerated parallel MRI Reconstruction using two distinct datasets and on accelerated parallel dynamic MRI Reconstruction using another dataset. Our comparative analysis with state-of-the-art methods demonstrates the superior performance of vSHARP in these applications.
CVNov 20, 2023
Kandinsky Conformal Prediction: Efficient Calibration of Image Segmentation AlgorithmsJoren Brunekreef, Eric Marcus, Ray Sheombarsing et al.
Image segmentation algorithms can be understood as a collection of pixel classifiers, for which the outcomes of nearby pixels are correlated. Classifier models can be calibrated using Inductive Conformal Prediction, but this requires holding back a sufficiently large calibration dataset for computing the distribution of non-conformity scores of the model's predictions. If one only requires only marginal calibration on the image level, this calibration set consists of all individual pixels in the images available for calibration. However, if the goal is to attain proper calibration for each individual pixel classifier, the calibration set consists of individual images. In a scenario where data are scarce (such as the medical domain), it may not always be possible to set aside sufficiently many images for this pixel-level calibration. The method we propose, dubbed ``Kandinsky calibration'', makes use of the spatial structure present in the distribution of natural images to simultaneously calibrate the classifiers of ``similar'' pixels. This can be seen as an intermediate approach between marginal (imagewise) and conditional (pixelwise) calibration, where non-conformity scores are aggregated over similar image regions, thereby making more efficient use of the images available for calibration. We run experiments on segmentation algorithms trained and calibrated on subsets of the public MS-COCO and Medical Decathlon datasets, demonstrating that Kandinsky calibration method can significantly improve the coverage. When compared to both pixelwise and imagewise calibration on little data, the Kandinsky method achieves much lower coverage errors, indicating the data efficiency of the Kandinsky calibration.
IVOct 10, 2023
Deep Cardiac MRI Reconstruction with ADMMGeorge Yiasemis, Nikita Moriakov, Jan-Jakob Sonke et al.
Cardiac magnetic resonance imaging is a valuable non-invasive tool for identifying cardiovascular diseases. For instance, Cine MRI is the benchmark modality for assessing the cardiac function and anatomy. On the other hand, multi-contrast (T1 and T2) mapping has the potential to assess pathologies and abnormalities in the myocardium and interstitium. However, voluntary breath-holding and often arrhythmia, in combination with MRI's slow imaging speed, can lead to motion artifacts, hindering real-time acquisition image quality. Although performing accelerated acquisitions can facilitate dynamic imaging, it induces aliasing, causing low reconstructed image quality in Cine MRI and inaccurate T1 and T2 mapping estimation. In this work, inspired by related work in accelerated MRI reconstruction, we present a deep learning (DL)-based method for accelerated cine and multi-contrast reconstruction in the context of dynamic cardiac imaging. We formulate the reconstruction problem as a least squares regularized optimization task, and employ vSHARP, a state-of-the-art DL-based inverse problem solver, which incorporates half-quadratic variable splitting and the alternating direction method of multipliers with neural networks. We treat the problem in two setups; a 2D reconstruction and a 2D dynamic reconstruction task, and employ 2D and 3D deep learning networks, respectively. Our method optimizes in both the image and k-space domains, allowing for high reconstruction fidelity. Although the target data is undersampled with a Cartesian equispaced scheme, we train our model using both Cartesian and simulated non-Cartesian undersampling schemes to enhance generalization of the model to unseen data. Furthermore, our model adopts a deep neural network to learn and refine the sensitivity maps of multi-coil k-space data. Lastly, our method is jointly trained on both, undersampled cine and multi-contrast data.
CVJan 9, 2023
Few-shot Semantic Segmentation with Support-induced Graph Convolutional NetworkJie Liu, Yanqi Bao, Wenzhe Yin et al.
Few-shot semantic segmentation (FSS) aims to achieve novel objects segmentation with only a few annotated samples and has made great progress recently. Most of the existing FSS models focus on the feature matching between support and query to tackle FSS. However, the appearance variations between objects from the same category could be extremely large, leading to unreliable feature matching and query mask prediction. To this end, we propose a Support-induced Graph Convolutional Network (SiGCN) to explicitly excavate latent context structure in query images. Specifically, we propose a Support-induced Graph Reasoning (SiGR) module to capture salient query object parts at different semantic levels with a Support-induced GCN. Furthermore, an instance association (IA) module is designed to capture high-order instance context from both support and query instances. By integrating the proposed two modules, SiGCN can learn rich query context representation, and thus being more robust to appearance variations. Extensive experiments on PASCAL-5i and COCO-20i demonstrate that our SiGCN achieves state-of-the-art performance.
IVNov 27, 2023
Joint Supervised and Self-supervised Learning for MRI ReconstructionGeorge Yiasemis, Nikita Moriakov, Clara I. Sánchez et al.
Magnetic Resonance Imaging (MRI) represents an important diagnostic modality; however, its inherently slow acquisition process poses challenges in obtaining fully-sampled $k$-space data under motion. In the absence of fully-sampled acquisitions, serving as ground truths, training deep learning algorithms in a supervised manner to predict the underlying ground truth image becomes challenging. To address this limitation, self-supervised methods have emerged as a viable alternative, leveraging available subsampled $k$-space data to train deep neural networks for MRI reconstruction. Nevertheless, these approaches often fall short when compared to supervised methods. We propose Joint Supervised and Self-supervised Learning (JSSL), a novel training approach for deep learning-based MRI reconstruction algorithms aimed at enhancing reconstruction quality in cases where target datasets containing fully-sampled $k$-space measurements are unavailable. JSSL operates by simultaneously training a model in a self-supervised learning setting, using subsampled data from the target dataset(s), and in a supervised learning manner, utilizing datasets with fully-sampled $k$-space data, referred to as proxy datasets. We demonstrate JSSL's efficacy using subsampled prostate or cardiac MRI data as the target datasets, with fully-sampled brain and knee, or brain, knee and prostate $k$-space acquisitions, respectively, as proxy datasets. Our results showcase substantial improvements over conventional self-supervised methods, validated using common image quality metrics. Furthermore, we provide theoretical motivations for JSSL and establish "rule-of-thumb" guidelines for training MRI reconstruction models. JSSL effectively enhances MRI reconstruction quality in scenarios where fully-sampled $k$-space data is not available, leveraging the strengths of supervised learning by incorporating proxy datasets.
IVJul 17, 2023
Neural Modulation Fields for Conditional Cone Beam Neural TomographySamuele Papa, David M. Knigge, Riccardo Valperga et al.
Conventional Computed Tomography (CT) methods require large numbers of noise-free projections for accurate density reconstructions, limiting their applicability to the more complex class of Cone Beam Geometry CT (CBCT) reconstruction. Recently, deep learning methods have been proposed to overcome these limitations, with methods based on neural fields (NF) showing strong performance, by approximating the reconstructed density through a continuous-in-space coordinate based neural network. Our focus is on improving such methods, however, unlike previous work, which requires training an NF from scratch for each new set of projections, we instead propose to leverage anatomical consistencies over different scans by training a single conditional NF on a dataset of projections. We propose a novel conditioning method where local modulations are modeled per patient as a field over the input domain through a Neural Modulation Field (NMF). The resulting Conditional Cone Beam Neural Tomography (CondCBNT) shows improved performance for both high and low numbers of available projections on noise-free and noisy data.
LGFeb 9, 2023
Constrained Empirical Risk Minimization: Theory and PracticeEric Marcus, Ray Sheombarsing, Jan-Jakob Sonke et al.
Deep Neural Networks (DNNs) are widely used for their ability to effectively approximate large classes of functions. This flexibility, however, makes the strict enforcement of constraints on DNNs an open problem. Here we present a framework that, under mild assumptions, allows the exact enforcement of constraints on parameterized sets of functions such as DNNs. Instead of imposing "soft'' constraints via additional terms in the loss, we restrict (a subset of) the DNN parameters to a submanifold on which the constraints are satisfied exactly throughout the entire training procedure. We focus on constraints that are outside the scope of equivariant networks used in Geometric Deep Learning. As a major example of the framework, we restrict filters of a Convolutional Neural Network (CNN) to be wavelets, and apply these wavelet networks to the task of contour prediction in the medical domain.
LGFeb 10
Towards Uniformity and Alignment for Multimodal Representation LearningWenzhe Yin, Pan Zhou, Zehao Xiao et al.
Multimodal representation learning aims to construct a shared embedding space in which heterogeneous modalities are semantically aligned. Despite strong empirical results, InfoNCE-based objectives introduce inherent conflicts that yield distribution gaps across modalities. In this work, we identify two conflicts in the multimodal regime, both exacerbated as the number of modalities increases: (i) an alignment-uniformity conflict, whereby the repulsion of uniformity undermines pairwise alignment, and (ii) an intra-alignment conflict, where aligning multiple modalities induces competing alignment directions. To address these issues, we propose a principled decoupling of alignment and uniformity for multimodal representations, providing a conflict-free recipe for multimodal learning that simultaneously supports discriminative and generative use cases without task-specific modules. We then provide a theoretical guarantee that our method acts as an efficient proxy for a global Hölder divergence over multiple modality distributions, and thus reduces the distribution gap among modalities. Extensive experiments on retrieval and UnCLIP-style generation demonstrate consistent gains.
CVDec 16, 2023Code
How to Train Neural Field Representations: A Comprehensive Study and BenchmarkSamuele Papa, Riccardo Valperga, David Knigge et al.
Neural fields (NeFs) have recently emerged as a versatile method for modeling signals of various modalities, including images, shapes, and scenes. Subsequently, a number of works have explored the use of NeFs as representations for downstream tasks, e.g. classifying an image based on the parameters of a NeF that has been fit to it. However, the impact of the NeF hyperparameters on their quality as downstream representation is scarcely understood and remains largely unexplored. This is in part caused by the large amount of time required to fit datasets of neural fields. In this work, we propose a JAX-based library that leverages parallelization to enable fast optimization of large-scale NeF datasets, resulting in a significant speed-up. With this library, we perform a comprehensive study that investigates the effects of different hyperparameters on fitting NeFs for downstream tasks. In particular, we explore the use of a shared initialization, the effects of overtraining, and the expressiveness of the network architectures used. Our study provides valuable insights on how to train NeFs and offers guidance for optimizing their effectiveness in downstream applications. Finally, based on the proposed library and our analysis, we propose Neural Field Arena, a benchmark consisting of neural field variants of popular vision datasets, including MNIST, CIFAR, variants of ImageNet, and ShapeNetv2. Our library and the Neural Field Arena will be open-sourced to introduce standardized benchmarking and promote further research on neural fields.
MED-PHDec 24, 2025
Equivariant Multiscale Learned Invertible Reconstruction for Cone Beam CT: From Simulated to Real DataNikita Moriakov, Efstratios Gavves, Jonathan H. Mason et al.
Cone Beam CT (CBCT) is an important imaging modality nowadays, however lower image quality of CBCT compared to more conventional Computed Tomography (CT) remains a limiting factor in CBCT applications. Deep learning reconstruction methods are a promising alternative to classical analytical and iterative reconstruction methods, but applying such methods to CBCT is often difficult due to the lack of ground truth data, memory limitations and the need for fast inference at clinically-relevant resolutions. In this work we propose LIRE++, an end-to-end rotationally-equivariant multiscale learned invertible primal-dual scheme for fast and memory-efficient CBCT reconstruction. Memory optimizations and multiscale reconstruction allow for fast training and inference, while rotational equivariance improves parameter efficiency. LIRE++ was trained on simulated projection data from a fast quasi-Monte Carlo CBCT projection simulator that we developed as well. Evaluated on synthetic data, LIRE++ gave an average improvement of 1 dB in Peak Signal-to-Noise Ratio over alternative deep learning baselines. On real clinical data, LIRE++ improved the average Mean Absolute Error between the reconstruction and the corresponding planning CT by 10 Hounsfield Units with respect to current proprietary state-of-the-art hybrid deep-learning/iterative method.
AINov 7, 2024Code
CaPo: Cooperative Plan Optimization for Efficient Embodied Multi-Agent CooperationJie Liu, Pan Zhou, Yingjun Du et al.
In this work, we address the cooperation problem among large language model (LLM) based embodied agents, where agents must cooperate to achieve a common goal. Previous methods often execute actions extemporaneously and incoherently, without long-term strategic and cooperative planning, leading to redundant steps, failures, and even serious repercussions in complex tasks like search-and-rescue missions where discussion and cooperative plan are crucial. To solve this issue, we propose Cooperative Plan Optimization (CaPo) to enhance the cooperation efficiency of LLM-based embodied agents. Inspired by human cooperation schemes, CaPo improves cooperation efficiency with two phases: 1) meta-plan generation, and 2) progress-adaptive meta-plan and execution. In the first phase, all agents analyze the task, discuss, and cooperatively create a meta-plan that decomposes the task into subtasks with detailed steps, ensuring a long-term strategic and coherent plan for efficient coordination. In the second phase, agents execute tasks according to the meta-plan and dynamically adjust it based on their latest progress (e.g., discovering a target object) through multi-turn discussions. This progress-based adaptation eliminates redundant actions, improving the overall cooperation efficiency of agents. Experimental results on the ThreeDworld Multi-Agent Transport and Communicative Watch-And-Help tasks demonstrate that CaPo achieves much higher task completion rate and efficiency compared with state-of-the-arts.The code is released at https://github.com/jliu4ai/CaPo.
CVJan 29, 2024Code
Dynamic Prototype Adaptation with Distillation for Few-shot Point Cloud SegmentationJie Liu, Wenzhe Yin, Haochen Wang et al.
Few-shot point cloud segmentation seeks to generate per-point masks for previously unseen categories, using only a minimal set of annotated point clouds as reference. Existing prototype-based methods rely on support prototypes to guide the segmentation of query point clouds, but they encounter challenges when significant object variations exist between the support prototypes and query features. In this work, we present dynamic prototype adaptation (DPA), which explicitly learns task-specific prototypes for each query point cloud to tackle the object variation problem. DPA achieves the adaptation through prototype rectification, aligning vanilla prototypes from support with the query feature distribution, and prototype-to-query attention, extracting task-specific context from query point clouds. Furthermore, we introduce a prototype distillation regularization term, enabling knowledge transfer between early-stage prototypes and their deeper counterparts during adaption. By iteratively applying these adaptations, we generate task-specific prototypes for accurate mask predictions on query point clouds. Extensive experiments on two popular benchmarks show that DPA surpasses state-of-the-art methods by a significant margin, e.g., 7.43\% and 6.39\% under the 2-way 1-shot setting on S3DIS and ScanNet, respectively. Code is available at https://github.com/jliu4ai/DPA.
CVJun 28, 2025Code
Probabilistic Prototype Calibration of Vision-Language Models for Generalized Few-shot Semantic SegmentationJie Liu, Jiayi Shen, Pan Zhou et al.
Generalized Few-Shot Semantic Segmentation (GFSS) aims to extend a segmentation model to novel classes with only a few annotated examples while maintaining performance on base classes. Recently, pretrained vision-language models (VLMs) such as CLIP have been leveraged in GFSS to improve generalization on novel classes through multi-modal prototypes learning. However, existing prototype-based methods are inherently deterministic, limiting the adaptability of learned prototypes to diverse samples, particularly for novel classes with scarce annotations. To address this, we propose FewCLIP, a probabilistic prototype calibration framework over multi-modal prototypes from the pretrained CLIP, thus providing more adaptive prototype learning for GFSS. Specifically, FewCLIP first introduces a prototype calibration mechanism, which refines frozen textual prototypes with learnable visual calibration prototypes, leading to a more discriminative and adaptive representation. Furthermore, unlike deterministic prototype learning techniques, FewCLIP introduces distribution regularization over these calibration prototypes. This probabilistic formulation ensures structured and uncertainty-aware prototype learning, effectively mitigating overfitting to limited novel class data while enhancing generalization. Extensive experimental results on PASCAL-5$^i$ and COCO-20$^i$ datasets demonstrate that our proposed FewCLIP significantly outperforms state-of-the-art approaches across both GFSS and class-incremental setting. The code is available at https://github.com/jliu4ai/FewCLIP.
LGFeb 24, 2025
Distributional Vision-Language Alignment by Cauchy-Schwarz DivergenceWenzhe Yin, Zehao Xiao, Pan Zhou et al.
Multimodal alignment is crucial for various downstream tasks such as cross-modal generation and retrieval. Previous multimodal approaches like CLIP utilize InfoNCE to maximize mutual information, primarily aligning pairwise samples across modalities while overlooking distributional differences. In addition, InfoNCE has inherent conflict in terms of alignment and uniformity in multimodality, leading to suboptimal alignment with modality gaps. To overcome the limitations, we propose CS-Aligner, a novel framework that performs distributional vision-language alignment by integrating Cauchy-Schwarz (CS) divergence with mutual information. CS-Aligner captures both the global distribution information of each modality and the pairwise semantic relationships. We find that the CS divergence seamlessly addresses the InfoNCE's alignment-uniformity conflict and serves complementary roles with InfoNCE, yielding tighter and more precise alignment. Moreover, by introducing distributional alignment, CS-Aligner enables incorporating additional information from unpaired data and token-level representations, enhancing flexible and fine-grained alignment in practice. Experiments on text-to-image generation and cross-modality retrieval tasks demonstrate the effectiveness of our method on vision-language alignment.
IVMar 15, 2024
End-to-end Adaptive Dynamic Subsampling and Reconstruction for Cardiac MRIGeorge Yiasemis, Jan-Jakob Sonke, Jonas Teuwen
$\textbf{Background:}$ Accelerating dynamic MRI is vital for advancing clinical applications and improving patient comfort. Commonly, deep learning (DL) methods for accelerated dynamic MRI reconstruction typically rely on uniformly applying non-adaptive predetermined or random subsampling patterns across all temporal frames of the dynamic acquisition. This approach fails to exploit temporal correlations or optimize subsampling on a case-by-case basis. $\textbf{Purpose:}$ To develop an end-to-end approach for adaptive dynamic MRI subsampling and reconstruction, capable of generating customized sampling patterns maximizing at the same time reconstruction quality. $\textbf{Methods:}$ We introduce the End-to-end Adaptive Dynamic Sampling and Reconstruction (E2E-ADS-Recon) for MRI framework, which integrates an adaptive dynamic sampler (ADS) that adapts the acquisition trajectory to each case for a given acceleration factor with a state-of-the-art dynamic reconstruction network, vSHARP, for reconstructing the adaptively sampled data into a dynamic image. The ADS can produce either frame-specific patterns or unified patterns applied to all temporal frames. E2E-ADS-Recon is evaluated under both frame-specific and unified 1D or 2D sampling settings, using dynamic cine cardiac MRI data and compared with vSHARP models employing standard subsampling trajectories, as well as pipelines where ADS was replaced by parameterized samplers optimized for dataset-specific schemes. $\textbf{Results:}$ E2E-ADS-Recon exhibited superior reconstruction quality, especially at high accelerations, in terms of standard quantitative metrics (SSIM, pSNR, NMSE). $\textbf{Conclusion:}$ The proposed framework improves reconstruction quality, highlighting the importance of case-specific subsampling optimization in dynamic MRI applications.
IVNov 2, 2024
Deep Multi-contrast Cardiac MRI Reconstruction via vSHARP with Auxiliary Refinement NetworkGeorge Yiasemis, Nikita Moriakov, Jan-Jakob Sonke et al.
Cardiac MRI (CMRI) is a cornerstone imaging modality that provides in-depth insights into cardiac structure and function. Multi-contrast CMRI (MCCMRI), which acquires sequences with varying contrast weightings, significantly enhances diagnostic capabilities by capturing a wide range of cardiac tissue characteristics. However, MCCMRI is often constrained by lengthy acquisition times and susceptibility to motion artifacts. To mitigate these challenges, accelerated imaging techniques that use k-space undersampling via different sampling schemes at acceleration factors have been developed to shorten scan durations. In this context, we propose a deep learning-based reconstruction method for 2D dynamic multi-contrast, multi-scheme, and multi-acceleration MRI. Our approach integrates the state-of-the-art vSHARP model, which utilizes half-quadratic variable splitting and ADMM optimization, with a Variational Network serving as an Auxiliary Refinement Network (ARN) to better adapt to the diverse nature of MCCMRI data. Specifically, the subsampled k-space data is fed into the ARN, which produces an initial prediction for the denoising step used by vSHARP. This, along with the subsampled k-space, is then used by vSHARP to generate high-quality 2D sequence predictions. Our method outperforms traditional reconstruction techniques and other vSHARP-based models.
CVMay 3, 2025
Probabilistic Interactive 3D Segmentation with Hierarchical Neural ProcessesJie Liu, Pan Zhou, Zehao Xiao et al.
Interactive 3D segmentation has emerged as a promising solution for generating accurate object masks in complex 3D scenes by incorporating user-provided clicks. However, two critical challenges remain underexplored: (1) effectively generalizing from sparse user clicks to produce accurate segmentation, and (2) quantifying predictive uncertainty to help users identify unreliable regions. In this work, we propose NPISeg3D, a novel probabilistic framework that builds upon Neural Processes (NPs) to address these challenges. Specifically, NPISeg3D introduces a hierarchical latent variable structure with scene-specific and object-specific latent variables to enhance few-shot generalization by capturing both global context and object-specific characteristics. Additionally, we design a probabilistic prototype modulator that adaptively modulates click prototypes with object-specific latent variables, improving the model's ability to capture object-aware context and quantify predictive uncertainty. Experiments on four 3D point cloud datasets demonstrate that NPISeg3D achieves superior segmentation performance with fewer clicks while providing reliable uncertainty estimations.
MED-PHJun 23, 2025
Physics-Guided Radiotherapy Treatment Planning with Deep LearningStefanos Achlatis, Efstratios Gavves, Jan-Jakob Sonke
Radiotherapy (RT) is a critical cancer treatment, with volumetric modulated arc therapy (VMAT) being a commonly used technique that enhances dose conformity by dynamically adjusting multileaf collimator (MLC) positions and monitor units (MU) throughout gantry rotation. Adaptive radiotherapy requires frequent modifications to treatment plans to account for anatomical variations, necessitating time-efficient solutions. Deep learning offers a promising solution to automate this process. To this end, we propose a two-stage, physics-guided deep learning pipeline for radiotherapy planning. In the first stage, our network is trained with direct supervision on treatment plan parameters, consisting of MLC and MU values. In the second stage, we incorporate an additional supervision signal derived from the predicted 3D dose distribution, integrating physics-based guidance into the training process. We train and evaluate our approach on 133 prostate cancer patients treated with a uniform 2-arc VMAT protocol delivering a dose of 62 Gy to the planning target volume (PTV). Our results demonstrate that the proposed approach, implemented using both 3D U-Net and UNETR architectures, consistently produces treatment plans that closely match clinical ground truths. Our method achieves a mean difference of D95% = 0.42 +/- 1.83 Gy and V95% = -0.22 +/- 1.87% at the PTV while generating dose distributions that reduce radiation exposure to organs at risk. These findings highlight the potential of physics-guided deep learning in RT planning.
CVFeb 4, 2025
Geometric Neural Process FieldsWenzhe Yin, Zehao Xiao, Jiayi Shen et al.
This paper addresses the challenge of Neural Field (NeF) generalization, where models must efficiently adapt to new signals given only a few observations. To tackle this, we propose Geometric Neural Process Fields (G-NPF), a probabilistic framework for neural radiance fields that explicitly captures uncertainty. We formulate NeF generalization as a probabilistic problem, enabling direct inference of NeF function distributions from limited context observations. To incorporate structural inductive biases, we introduce a set of geometric bases that encode spatial structure and facilitate the inference of NeF function distributions. Building on these bases, we design a hierarchical latent variable model, allowing G-NPF to integrate structural information across multiple spatial levels and effectively parameterize INR functions. This hierarchical approach improves generalization to novel scenes and unseen signals. Experiments on novel-view synthesis for 3D scenes, as well as 2D image and 1D signal regression, demonstrate the effectiveness of our method in capturing uncertainty and leveraging structural information for improved generalization.
IVNov 27, 2024
Deep End-to-end Adaptive k-Space Sampling, Reconstruction, and Registration for Dynamic MRIGeorge Yiasemis, Jan-Jakob Sonke, Jonas Teuwen
Dynamic MRI enables a range of clinical applications, including cardiac function assessment, organ motion tracking, and radiotherapy guidance. However, fully sampling the dynamic k-space data is often infeasible due to time constraints and physiological motion such as respiratory and cardiac motion. This necessitates undersampling, which degrades the quality of reconstructed images. Poor image quality not only hinders visualization but also impairs the estimation of deformation fields, crucial for registering dynamic (moving) images to a static reference image. This registration enables tasks such as motion correction, treatment planning, and quantitative analysis in applications like cardiac imaging and MR-guided radiotherapy. To overcome the challenges posed by undersampling and motion, we introduce an end-to-end deep learning (DL) framework that integrates adaptive dynamic k-space sampling, reconstruction, and registration. Our approach begins with a DL-based adaptive sampling strategy, optimizing dynamic k-space acquisition to capture the most relevant data for each specific case. This is followed by a DL-based reconstruction module that produces images optimized for accurate deformation field estimation from the undersampled moving data. Finally, a registration module estimates the deformation fields aligning the reconstructed dynamic images with a static reference. The proposed framework is independent of specific reconstruction and registration modules allowing for plug-and-play integration of these components. The entire framework is jointly trained using a combination of supervised and unsupervised loss functions, enabling end-to-end optimization for improved performance across all components. Through controlled experiments and ablation studies, we validate each component, demonstrating that each choice contributes to robust motion estimation from undersampled dynamic data.
LGJun 10, 2024
Space-Time Continuous PDE Forecasting using Equivariant Neural FieldsDavid M. Knigge, David R. Wessels, Riccardo Valperga et al.
Recently, Conditional Neural Fields (NeFs) have emerged as a powerful modelling paradigm for PDEs, by learning solutions as flows in the latent space of the Conditional NeF. Although benefiting from favourable properties of NeFs such as grid-agnosticity and space-time-continuous dynamics modelling, this approach limits the ability to impose known constraints of the PDE on the solutions -- e.g. symmetries or boundary conditions -- in favour of modelling flexibility. Instead, we propose a space-time continuous NeF-based solving framework that - by preserving geometric information in the latent space - respects known symmetries of the PDE. We show that modelling solutions as flows of pointclouds over the group of interest $G$ improves generalization and data-efficiency. We validated that our framework readily generalizes to unseen spatial and temporal locations, as well as geometric transformations of the initial conditions - where other NeF-based PDE forecasting methods fail - and improve over baselines in a number of challenging geometries.
MED-PHJan 20, 2024
Equivariant Multiscale Learned Invertible Reconstruction for Cone Beam CTNikita Moriakov, Jan-Jakob Sonke, Jonas Teuwen
Cone Beam CT (CBCT) is an essential imaging modality nowadays, but the image quality of CBCT still lags behind the high quality standards established by the conventional Computed Tomography. We propose LIRE+, a learned iterative scheme for fast and memory-efficient CBCT reconstruction, which is a substantially faster and more parameter-efficient alternative to the recently proposed LIRE method. LIRE+ is a rotationally-equivariant multiscale learned invertible primal-dual iterative scheme for CBCT reconstruction. Memory usage is optimized by relying on simple reversible residual networks in primal/dual cells and patch-wise computations inside the cells during forward and backward passes, while increased inference speed is achieved by making the primal-dual scheme multiscale so that the reconstruction process starts at low resolution and with low resolution primal/dual latent vectors. A LIRE+ model was trained and validated on a set of 260 + 22 thorax CT scans and tested using a set of 142 thorax CT scans with additional evaluation with and without finetuning on an out-of-distribution set of 79 Head and Neck (HN) CT scans. Our method surpasses classical and deep learning baselines, including LIRE, on the thorax test set. For a similar inference time and with only 37 % of the parameter budget, LIRE+ achieves a +0.2 dB PSNR improvement over LIRE, while being able to match the performance of LIRE in 45 % less inference time and with 28 % of the parameter budget. Rotational equivariance ensures robustness of LIRE+ to patient orientation, while LIRE and other deep learning baselines suffer from substantial performance degradation when patient orientation is unusual. On the HN dataset in the absence of finetuning, LIRE+ is generally comparable to LIRE in performance apart from a few outlier cases, whereas after identical finetuning LIRE+ demonstates a +1.02 dB PSNR improvement over LIRE.
IVNov 18, 2021
Recurrent Variational Network: A Deep Learning Inverse Problem Solver applied to the task of Accelerated MRI ReconstructionGeorge Yiasemis, Jan-Jakob Sonke, Clarisa Sánchez et al.
Magnetic Resonance Imaging can produce detailed images of the anatomy and physiology of the human body that can assist doctors in diagnosing and treating pathologies such as tumours. However, MRI suffers from very long acquisition times that make it susceptible to patient motion artifacts and limit its potential to deliver dynamic treatments. Conventional approaches such as Parallel Imaging and Compressed Sensing allow for an increase in MRI acquisition speed by reconstructing MR images from sub-sampled MRI data acquired using multiple receiver coils. Recent advancements in Deep Learning combined with Parallel Imaging and Compressed Sensing techniques have the potential to produce high-fidelity reconstructions from highly accelerated MRI data. In this work we present a novel Deep Learning-based Inverse Problem solver applied to the task of Accelerated MRI Reconstruction, called the Recurrent Variational Network (RecurrentVarNet), by exploiting the properties of Convolutional Recurrent Neural Networks and unrolled algorithms for solving Inverse Problems. The RecurrentVarNet consists of multiple recurrent blocks, each responsible for one iteration of the unrolled variational optimization scheme for solving the inverse problem of multi-coil Accelerated MRI Reconstruction. Contrary to traditional approaches, the optimization steps are performed in the observation domain ($k$-space) instead of the image domain. Each block of the RecurrentVarNet refines the observed $k$-space and comprises a data consistency term and a recurrent unit which takes as input a learned hidden state and the prediction of the previous block. Our proposed method achieves new state of the art qualitative and quantitative reconstruction results on 5-fold and 10-fold accelerated data from a public multi-coil brain dataset, outperforming previous conventional and deep learning-based approaches.
CVOct 28, 2021
Subpixel object segmentation using wavelets and multi resolution analysisRay Sheombarsing, Nikita Moriakov, Jan-Jakob Sonke et al.
We propose a novel deep learning framework for fast prediction of boundaries of two-dimensional simply connected domains using wavelets and Multi Resolution Analysis (MRA). The boundaries are modelled as (piecewise) smooth closed curves using wavelets and the so-called Pyramid Algorithm. Our network architecture is a hybrid analog of the U-Net, where the down-sampling path is a two-dimensional encoder with learnable filters, and the upsampling path is a one-dimensional decoder, which builds curves up from low to high resolution levels. Any wavelet basis induced by a MRA can be used. This flexibility allows for incorporation of priors on the smoothness of curves. The effectiveness of the proposed method is demonstrated by delineating boundaries of simply connected domains (organs) in medical images using Debauches wavelets and comparing performance with a U-Net baseline. Our model demonstrates up to 5x faster inference speed compared to the U-Net, while maintaining similar performance in terms of Dice score and Hausdorff distance.
IVAug 17, 2021
Deep MRI Reconstruction with Radial SubsamplingGeorge Yiasemis, Chaoping Zhang, Clara I. Sánchez et al.
In spite of its extensive adaptation in almost every medical diagnostic and examinatorial application, Magnetic Resonance Imaging (MRI) is still a slow imaging modality which limits its use for dynamic imaging. In recent years, Parallel Imaging (PI) and Compressed Sensing (CS) have been utilised to accelerate the MRI acquisition. In clinical settings, subsampling the k-space measurements during scanning time using Cartesian trajectories, such as rectilinear sampling, is currently the most conventional CS approach applied which, however, is prone to producing aliased reconstructions. With the advent of the involvement of Deep Learning (DL) in accelerating the MRI, reconstructing faithful images from subsampled data became increasingly promising. Retrospectively applying a subsampling mask onto the k-space data is a way of simulating the accelerated acquisition of k-space data in real clinical setting. In this paper we compare and provide a review for the effect of applying either rectilinear or radial retrospective subsampling on the quality of the reconstructions outputted by trained deep neural networks. With the same choice of hyper-parameters, we train and evaluate two distinct Recurrent Inference Machines (RIMs), one for each type of subsampling. The qualitative and quantitative results of our experiments indicate that the model trained on data with radial subsampling attains higher performance and learns to estimate reconstructions with higher fidelity paving the way for other DL approaches to involve radial subsampling.