LGNov 4, 2022
MONAI: An open-source framework for deep learning in healthcareM. Jorge Cardoso, Wenqi Li, Richard Brown et al.
Artificial Intelligence (AI) is having a tremendous impact across most areas of science. Applications of AI in healthcare have the potential to improve our ability to detect, diagnose, prognose, and intervene on human disease. For AI models to be used clinically, they need to be made safe, reproducible and robust, and the underlying software framework must be aware of the particularities (e.g. geometry, physiology, physics) of medical data being processed. This work introduces MONAI, a freely available, community-supported, and consortium-led PyTorch-based framework for deep learning in healthcare. MONAI extends PyTorch to support medical data, with a particular focus on imaging, and provide purpose-specific AI model architectures, transformations and utilities that streamline the development and deployment of medical AI models. MONAI follows best practices for software-development, providing an easy-to-use, robust, well-documented, and well-tested software framework. MONAI preserves the simple, additive, and compositional approach of its underlying PyTorch libraries. MONAI is being used by and receiving contributions from research, clinical and industrial teams from around the world, who are pursuing applications spanning nearly every aspect of healthcare.
IVAug 30, 2023Code
A Recycling Training Strategy for Medical Image Segmentation with Diffusion Denoising ModelsYunguan Fu, Yiwen Li, Shaheer U Saeed et al.
Denoising diffusion models have found applications in image segmentation by generating segmented masks conditioned on images. Existing studies predominantly focus on adjusting model architecture or improving inference, such as test-time sampling strategies. In this work, we focus on improving the training strategy and propose a novel recycling method. During each training step, a segmentation mask is first predicted given an image and a random noise. This predicted mask, which replaces the conventional ground truth mask, is used for denoising task during training. This approach can be interpreted as aligning the training strategy with inference by eliminating the dependence on ground truth masks for generating noisy samples. Our proposed method significantly outperforms standard diffusion training, self-conditioning, and existing recycling strategies across multiple medical imaging data sets: muscle ultrasound, abdominal CT, prostate MR, and brain MR. This holds for two widely adopted sampling strategies: denoising diffusion probabilistic model and denoising diffusion implicit model. Importantly, existing diffusion models often display a declining or unstable performance during inference, whereas our novel recycling consistently enhances or maintains performance. We show that, under a fair comparison with the same network architectures and computing budget, the proposed recycling-based diffusion models achieved on-par performance with non-diffusion-based supervised training. By ensembling the proposed diffusion and the non-diffusion models, significant improvements to the non-diffusion models have been observed across all applications, demonstrating the value of this novel training method. This paper summarizes these quantitative results and discusses their values, with a fully reproducible JAX-based implementation, released at https://github.com/mathpluscode/ImgX-DiffSeg.
IVMar 10, 2023Code
Importance of Aligning Training Strategy with Evaluation for Diffusion Models in 3D Multiclass SegmentationYunguan Fu, Yiwen Li, Shaheer U. Saeed et al.
Recently, denoising diffusion probabilistic models (DDPM) have been applied to image segmentation by generating segmentation masks conditioned on images, while the applications were mainly limited to 2D networks without exploiting potential benefits from the 3D formulation. In this work, we studied the DDPM-based segmentation model for 3D multiclass segmentation on two large multiclass data sets (prostate MR and abdominal CT). We observed that the difference between training and test methods led to inferior performance for existing DDPM methods. To mitigate the inconsistency, we proposed a recycling method which generated corrupted masks based on the model's prediction at a previous time step instead of using ground truth. The proposed method achieved statistically significantly improved performance compared to existing DDPMs, independent of a number of other techniques for reducing train-test discrepancy, including performing mask prediction, using Dice loss, and reducing the number of diffusion time steps during training. The performance of diffusion models was also competitive and visually similar to non-diffusion-based U-net, within the same compute budget. The JAX-based diffusion framework has been released at https://github.com/mathpluscode/ImgX-DiffSeg.
IVSep 5, 2022
Domain Generalization for Prostate Segmentation in Transrectal Ultrasound Images: A Multi-center StudySulaiman Vesal, Iani Gayo, Indrani Bhattacharya et al. · stanford
Prostate biopsy and image-guided treatment procedures are often performed under the guidance of ultrasound fused with magnetic resonance images (MRI). Accurate image fusion relies on accurate segmentation of the prostate on ultrasound images. Yet, the reduced signal-to-noise ratio and artifacts (e.g., speckle and shadowing) in ultrasound images limit the performance of automated prostate segmentation techniques and generalizing these methods to new image domains is inherently difficult. In this study, we address these challenges by introducing a novel 2.5D deep neural network for prostate segmentation on ultrasound images. Our approach addresses the limitations of transfer learning and finetuning methods (i.e., drop in performance on the original training data when the model weights are updated) by combining a supervised domain adaptation technique and a knowledge distillation loss. The knowledge distillation loss allows the preservation of previously learned knowledge and reduces the performance drop after model finetuning on new datasets. Furthermore, our approach relies on an attention module that considers model feature positioning information to improve the segmentation accuracy. We trained our model on 764 subjects from one institution and finetuned our model using only ten subjects from subsequent institutions. We analyzed the performance of our method on three large datasets encompassing 2067 subjects from three different institutions. Our method achieved an average Dice Similarity Coefficient (Dice) of $94.0\pm0.03$ and Hausdorff Distance (HD95) of 2.28 $mm$ in an independent set of subjects from the first institution. Moreover, our model generalized well in the studies from the other two institutions (Dice: $91.0\pm0.03$; HD95: 3.7$mm$ and Dice: $82.0\pm0.03$; HD95: 7.1 $mm$).
IVJul 8, 2024Code
Nonrigid Reconstruction of Freehand Ultrasound without a TrackerQi Li, Ziyi Shen, Qianye Yang et al.
Reconstructing 2D freehand Ultrasound (US) frames into 3D space without using a tracker has recently seen advances with deep learning. Predicting good frame-to-frame rigid transformations is often accepted as the learning objective, especially when the ground-truth labels from spatial tracking devices are inherently rigid transformations. Motivated by a) the observed nonrigid deformation due to soft tissue motion during scanning, and b) the highly sensitive prediction of rigid transformation, this study investigates the methods and their benefits in predicting nonrigid transformations for reconstructing 3D US. We propose a novel co-optimisation algorithm for simultaneously estimating rigid transformations among US frames, supervised by ground-truth from a tracker, and a nonrigid deformation, optimised by a regularised registration network. We show that these two objectives can be either optimised using meta-learning or combined by weighting. A fast scattered data interpolation is also developed for enabling frequent reconstruction and registration of non-parallel US frames, during training. With a new data set containing over 357,000 frames in 720 scans, acquired from 60 subjects, the experiments demonstrate that, due to an expanded thus easier-to-optimise solution space, the generalisation is improved with the added deformation estimation, with respect to the rigid ground-truth. The global pixel reconstruction error (assessing accumulative prediction) is lowered from 18.48 to 16.51 mm, compared with baseline rigid-transformation-predicting methods. Using manually identified landmarks, the proposed co-optimisation also shows potentials in compensating nonrigid tissue motion at inference, which is not measurable by tracker-provided ground-truth. The code and data used in this paper are made publicly available at https://github.com/QiLi111/NR-Rec-FUS.
IVFeb 20, 2023
Non-rigid Medical Image Registration using Physics-informed Neural NetworksZhe Min, Zachary M. C. Baum, Shaheer U. Saeed et al.
Biomechanical modelling of soft tissue provides a non-data-driven method for constraining medical image registration, such that the estimated spatial transformation is considered biophysically plausible. This has not only been adopted in real-world clinical applications, such as the MR-to-ultrasound registration for prostate intervention of interest in this work, but also provides an explainable means of understanding the organ motion and spatial correspondence establishment. This work instantiates the recently-proposed physics-informed neural networks (PINNs) to a 3D linear elastic model for modelling prostate motion commonly encountered during transrectal ultrasound guided procedures. To overcome a widely-recognised challenge in generalising PINNs to different subjects, we propose to use PointNet as the nodal-permutation-invariant feature extractor, together with a registration algorithm that aligns point sets and simultaneously takes into account the PINN-imposed biomechanics. The proposed method has been both developed and validated in both patient-specific and multi-patient manner.
IVMar 3, 2023
Bi-parametric prostate MR image synthesis using pathology and sequence-conditioned stable diffusionShaheer U. Saeed, Tom Syer, Wen Yan et al.
We propose an image synthesis mechanism for multi-sequence prostate MR images conditioned on text, to control lesion presence and sequence, as well as to generate paired bi-parametric images conditioned on images e.g. for generating diffusion-weighted MR from T2-weighted MR for paired data, which are two challenging tasks in pathological image synthesis. Our proposed mechanism utilises and builds upon the recent stable diffusion model by proposing image-based conditioning for paired data generation. We validate our method using 2D image slices from real suspected prostate cancer patients. The realism of the synthesised images is validated by means of a blind expert evaluation for identifying real versus fake images, where a radiologist with 4 years experience reading urological MR only achieves 59.4% accuracy across all tested sequences (where chance is 50%). For the first time, we evaluate the realism of the generated pathology by blind expert identification of the presence of suspected lesions, where we find that the clinician performs similarly for both real and synthesised images, with a 2.9 percentage point difference in lesion identification accuracy between real and synthesised images, demonstrating the potentials in radiological training purposes. Furthermore, we also show that a machine learning model, trained for lesion identification, shows better performance (76.2% vs 70.4%, statistically significant improvement) when trained with real data augmented by synthesised data as opposed to training with only real images, demonstrating usefulness for model training.
IVMar 27, 2022
Image quality assessment for machine learning tasks using meta-reinforcement learningShaheer U. Saeed, Yunguan Fu, Vasilis Stavrinides et al.
In this paper, we consider image quality assessment (IQA) as a measure of how images are amenable with respect to a given downstream task, or task amenability. When the task is performed using machine learning algorithms, such as a neural-network-based task predictor for image classification or segmentation, the performance of the task predictor provides an objective estimate of task amenability. In this work, we use an IQA controller to predict the task amenability which, itself being parameterised by neural networks, can be trained simultaneously with the task predictor. We further develop a meta-reinforcement learning framework to improve the adaptability for both IQA controllers and task predictors, such that they can be fine-tuned efficiently on new datasets or meta-tasks. We demonstrate the efficacy of the proposed task-specific, adaptable IQA approach, using two clinical applications for ultrasound-guided prostate intervention and pneumonia detection on X-ray images.
CVJul 3, 2024
Biomechanics-informed Non-rigid Medical Image Registration and its Inverse Material Property Estimation with Linear and Nonlinear ElasticityZhe Min, Zachary M. C. Baum, Shaheer U. Saeed et al.
This paper investigates both biomechanical-constrained non-rigid medical image registrations and accurate identifications of material properties for soft tissues, using physics-informed neural networks (PINNs). The complex nonlinear elasticity theory is leveraged to formally establish the partial differential equations (PDEs) representing physics laws of biomechanical constraints that need to be satisfied, with which registration and identification tasks are treated as forward (i.e., data-driven solutions of PDEs) and inverse (i.e., parameter estimation) problems under PINNs respectively. Two net configurations (i.e., Cfg1 and Cfg2) have also been compared for both linear and nonlinear physics model. Two sets of experiments have been conducted, using pairs of undeformed and deformed MR images from clinical cases of prostate cancer biopsy. Our contributions are summarised as follows. 1) We developed a learning-based biomechanical-constrained non-rigid registration algorithm using PINNs, where linear elasticity is generalised to the nonlinear version. 2) We demonstrated extensively that nonlinear elasticity shows no statistical significance against linear models in computing point-wise displacement vectors but their respective benefits may depend on specific patients, with finite-element (FE) computed ground-truth. 3) We formulated and solved the inverse parameter estimation problem, under the joint optimisation scheme of registration and parameter identification using PINNs, whose solutions can be accurately found by locating saddle points.
IVJul 17, 2023
Combiner and HyperCombiner Networks: Rules to Combine Multimodality MR Images for Prostate Cancer LocalisationWen Yan, Bernard Chiu, Ziyi Shen et al.
One of the distinct characteristics in radiologists' reading of multiparametric prostate MR scans, using reporting systems such as PI-RADS v2.1, is to score individual types of MR modalities, T2-weighted, diffusion-weighted, and dynamic contrast-enhanced, and then combine these image-modality-specific scores using standardised decision rules to predict the likelihood of clinically significant cancer. This work aims to demonstrate that it is feasible for low-dimensional parametric models to model such decision rules in the proposed Combiner networks, without compromising the accuracy of predicting radiologic labels: First, it is shown that either a linear mixture model or a nonlinear stacking model is sufficient to model PI-RADS decision rules for localising prostate cancer. Second, parameters of these (generalised) linear models are proposed as hyperparameters, to weigh multiple networks that independently represent individual image modalities in the Combiner network training, as opposed to end-to-end modality ensemble. A HyperCombiner network is developed to train a single image segmentation network that can be conditioned on these hyperparameters during inference, for much improved efficiency. Experimental results based on data from 850 patients, for the application of automating radiologist labelling multi-parametric MR, compare the proposed combiner networks with other commonly-adopted end-to-end networks. Using the added advantages of obtaining and interpreting the modality combining rules, in terms of the linear weights or odds-ratios on individual image modalities, three clinical applications are presented for prostate cancer segmentation, including modality availability assessment, importance quantification and rule discovery.
CVJul 25, 2024Code
Segmentation by registration-enabled SAM prompt engineering using five reference imagesYaxi Chen, Aleksandra Ivanova, Shaheer U. Saeed et al.
The recently proposed Segment Anything Model (SAM) is a general tool for image segmentation, but it requires additional adaptation and careful fine-tuning for medical image segmentation, especially for small, irregularly-shaped, and boundary-ambiguous anatomical structures such as the knee cartilage that is of interest in this work. Repaired cartilage, after certain surgical procedures, exhibits imaging patterns unseen to pre-training, posing further challenges for using models like SAM with or without general-purpose fine-tuning. To address this, we propose a novel registration-based prompt engineering framework for medical image segmentation using SAM. This approach utilises established image registration algorithms to align the new image (to-be-segmented) and a small number of reference images, without requiring segmentation labels. The spatial transformations generated by registration align either the new image or pre-defined point-based prompts, before using them as input to SAM. This strategy, requiring as few as five reference images with defined point prompts, effectively prompts SAM for inference on new images, without needing any segmentation labels. Evaluation of MR images from patients who received cartilage stem cell therapy yielded Dice scores of 0.89, 0.87, 0.53, and 0.52 for segmenting femur, tibia, femoral- and tibial cartilages, respectively. This outperforms atlas-based label fusion and is comparable to supervised nnUNet, an upper-bound fair baseline in this application, both of which require full segmentation labels for reference samples. The codes are available at: https://github.com/chrissyinreallife/KneeSegmentWithSAM.git
IVJul 29, 2023
Interpolation-Split: a data-centric deep learning approach with big interpolated data to boost airway segmentation performanceWing Keung Cheung, Ashkan Pakzad, Nesrin Mogulkoc et al.
The morphology and distribution of airway tree abnormalities enables diagnosis and disease characterisation across a variety of chronic respiratory conditions. In this regard, airway segmentation plays a critical role in the production of the outline of the entire airway tree to enable estimation of disease extent and severity. In this study, we propose a data-centric deep learning technique to segment the airway tree. The proposed technique utilises interpolation and image split to improve data usefulness and quality. Then, an ensemble learning strategy is implemented to aggregate the segmented airway trees at different scales. In terms of segmentation performance (dice similarity coefficient), our method outperforms the baseline model by 2.5% on average when a combined loss is used. Further, our proposed technique has a low GPU usage and high flexibility enabling it to be deployed on any 2D deep learning model.
IVJul 13, 2022
Collaborative Quantization Embeddings for Intra-Subject Prostate MR Image RegistrationZiyi Shen, Qianye Yang, Yuming Shen et al.
Image registration is useful for quantifying morphological changes in longitudinal MR images from prostate cancer patients. This paper describes a development in improving the learning-based registration algorithms, for this challenging clinical application often with highly variable yet limited training data. First, we report that the latent space can be clustered into a much lower dimensional space than that commonly found as bottleneck features at the deep layer of a trained registration network. Based on this observation, we propose a hierarchical quantization method, discretizing the learned feature vectors using a jointly-trained dictionary with a constrained size, in order to improve the generalisation of the registration networks. Furthermore, a novel collaborative dictionary is independently optimised to incorporate additional prior information, such as the segmentation of the gland or other regions of interest, in the latent quantized space. Based on 216 real clinical images from 86 prostate cancer patients, we show the efficacy of both the designed components. Improved registration accuracy was obtained with statistical significance, in terms of both Dice on gland and target registration error on corresponding landmarks, the latter of which achieved 5.46 mm, an improvement of 28.7\% from the baseline without quantization. Experimental results also show that the difference in performance was indeed minimised between training and testing data.
CVJul 26, 2022
Cross-Modality Image Registration using a Training-Time Privileged Third ModalityQianye Yang, David Atkinson, Yunguan Fu et al.
In this work, we consider the task of pairwise cross-modality image registration, which may benefit from exploiting additional images available only at training time from an additional modality that is different to those being registered. As an example, we focus on aligning intra-subject multiparametric Magnetic Resonance (mpMR) images, between T2-weighted (T2w) scans and diffusion-weighted scans with high b-value (DWI$_{high-b}$). For the application of localising tumours in mpMR images, diffusion scans with zero b-value (DWI$_{b=0}$) are considered easier to register to T2w due to the availability of corresponding features. We propose a learning from privileged modality algorithm, using a training-only imaging modality DWI$_{b=0}$, to support the challenging multi-modality registration problems. We present experimental results based on 369 sets of 3D multiparametric MRI images from 356 prostate cancer patients and report, with statistical significance, a lowered median target registration error of 4.34 mm, when registering the holdout DWI$_{high-b}$ and T2w image pairs, compared with that of 7.96 mm before registration. Results also show that the proposed learning-based registration networks enabled efficient registration with comparable or better accuracy, compared with a classical iterative algorithm and other tested learning-based methods with/without the additional modality. These compared algorithms also failed to produce any significantly improved alignment between DWI$_{high-b}$ and T2w in this challenging application.
IVNov 9, 2022
Trackerless freehand ultrasound with sequence modelling and auxiliary transformation over past and future framesQi Li, Ziyi Shen, Qian Li et al.
Three-dimensional (3D) freehand ultrasound (US) reconstruction without a tracker can be advantageous over its two-dimensional or tracked counterparts in many clinical applications. In this paper, we propose to estimate 3D spatial transformation between US frames from both past and future 2D images, using feed-forward and recurrent neural networks (RNNs). With the temporally available frames, a further multi-task learning algorithm is proposed to utilise a large number of auxiliary transformation-predicting tasks between them. Using more than 40,000 US frames acquired from 228 scans on 38 forearms of 19 volunteers in a volunteer study, the hold-out test performance is quantified by frame prediction accuracy, volume reconstruction overlap, accumulated tracking error and final drift, based on ground-truth from an optical tracker. The results show the importance of modelling the temporal-spatially correlated input frames as well as output transformations, with further improvement owing to additional past and/or future frames. The best performing model was associated with predicting transformation between moderately-spaced frames, with an interval of less than ten frames at 20 frames per second (fps). Little benefit was observed by adding frames more than one second away from the predicted transformation, with or without LSTM-based RNNs. Interestingly, with the proposed approach, explicit within-sequence loss that encourages consistency in composing transformations or minimises accumulated error may no longer be required. The implementation code and volunteer data will be made publicly available ensuring reproducibility and further research.
IVSep 12, 2022
Prototypical few-shot segmentation for cross-institution male pelvic structures with spatial registrationYiwen Li, Yunguan Fu, Iani Gayo et al.
The prowess that makes few-shot learning desirable in medical image analysis is the efficient use of the support image data, which are labelled to classify or segment new classes, a task that otherwise requires substantially more training images and expert annotations. This work describes a fully 3D prototypical few-shot segmentation algorithm, such that the trained networks can be effectively adapted to clinically interesting structures that are absent in training, using only a few labelled images from a different institute. First, to compensate for the widely recognised spatial variability between institutions in episodic adaptation of novel classes, a novel spatial registration mechanism is integrated into prototypical learning, consisting of a segmentation head and an spatial alignment module. Second, to assist the training with observed imperfect alignment, support mask conditioning module is proposed to further utilise the annotation available from the support images. Extensive experiments are presented in an application of segmenting eight anatomical structures important for interventional planning, using a data set of 589 pelvic T2-weighted MR images, acquired at seven institutes. The results demonstrate the efficacy in each of the 3D formulation, the spatial registration, and the support mask conditioning, all of which made positive contributions independently or collectively. Compared with the previously proposed 2D alternatives, the few-shot segmentation performance was improved with statistical significance, regardless whether the support data come from the same or different institutes.
IVOct 27, 2022
Meta-Learning Initializations for Interactive Medical Image RegistrationZachary M. C. Baum, Yipeng Hu, Dean Barratt
We present a meta-learning framework for interactive medical image registration. Our proposed framework comprises three components: a learning-based medical image registration algorithm, a form of user interaction that refines registration at inference, and a meta-learning protocol that learns a rapidly adaptable network initialization. This paper describes a specific algorithm that implements the registration, interaction and meta-learning protocol for our exemplar clinical application: registration of magnetic resonance (MR) imaging to interactively acquired, sparsely-sampled transrectal ultrasound (TRUS) images. Our approach obtains comparable registration error (4.26 mm) to the best-performing non-interactive learning-based 3D-to-3D method (3.97 mm) while requiring only a fraction of the data, and occurring in real-time during acquisition. Applying sparsely sampled data to non-interactive methods yields higher registration errors (6.26 mm), demonstrating the effectiveness of interactive MR-TRUS registration, which may be applied intraoperatively given the real-time nature of the adaptation process.
IVOct 16, 2023
Long-term Dependency for 3D Reconstruction of Freehand Ultrasound Without External TrackerQi Li, Ziyi Shen, Qian Li et al.
Objective: Reconstructing freehand ultrasound in 3D without any external tracker has been a long-standing challenge in ultrasound-assisted procedures. We aim to define new ways of parameterising long-term dependencies, and evaluate the performance. Methods: First, long-term dependency is encoded by transformation positions within a frame sequence. This is achieved by combining a sequence model with a multi-transformation prediction. Second, two dependency factors are proposed, anatomical image content and scanning protocol, for contributing towards accurate reconstruction. Each factor is quantified experimentally by reducing respective training variances. Results: 1) The added long-term dependency up to 400 frames at 20 frames per second (fps) indeed improved reconstruction, with an up to 82.4% lowered accumulated error, compared with the baseline performance. The improvement was found to be dependent on sequence length, transformation interval and scanning protocol and, unexpectedly, not on the use of recurrent networks with long-short term modules; 2) Decreasing either anatomical or protocol variance in training led to poorer reconstruction accuracy. Interestingly, greater performance was gained from representative protocol patterns, than from representative anatomical features. Conclusion: The proposed algorithm uses hyperparameter tuning to effectively utilise long-term dependency. The proposed dependency factors are of practical significance in collecting diverse training data, regulating scanning protocols and developing efficient networks. Significance: The proposed new methodology with publicly available volunteer data and code for parametersing the long-term dependency, experimentally shown to be valid sources of performance improvement, which could potentially lead to better model development and practical optimisation of the reconstruction application.
CVAug 8, 2022
Bayesian Pseudo Labels: Expectation Maximization for Robust and Efficient Semi-Supervised SegmentationMou-Cheng Xu, Yukun Zhou, Chen Jin et al.
This paper concerns pseudo labelling in segmentation. Our contribution is fourfold. Firstly, we present a new formulation of pseudo-labelling as an Expectation-Maximization (EM) algorithm for clear statistical interpretation. Secondly, we propose a semi-supervised medical image segmentation method purely based on the original pseudo labelling, namely SegPL. We demonstrate SegPL is a competitive approach against state-of-the-art consistency regularisation based methods on semi-supervised segmentation on a 2D multi-class MRI brain tumour segmentation task and a 3D binary CT lung vessel segmentation task. The simplicity of SegPL allows less computational cost comparing to prior methods. Thirdly, we demonstrate that the effectiveness of SegPL may originate from its robustness against out-of-distribution noises and adversarial attacks. Lastly, under the EM framework, we introduce a probabilistic generalisation of SegPL via variational inference, which learns a dynamic threshold for pseudo labelling during the training. We show that SegPL with variational inference can perform uncertainty estimation on par with the gold-standard method Deep Ensemble.
CVDec 3, 2022
Active learning using adaptable task-based prioritisationShaheer U. Saeed, João Ramalhinho, Mark Pinnock et al.
Supervised machine learning-based medical image computing applications necessitate expert label curation, while unlabelled image data might be relatively abundant. Active learning methods aim to prioritise a subset of available image data for expert annotation, for label-efficient model training. We develop a controller neural network that measures priority of images in a sequence of batches, as in batch-mode active learning, for multi-class segmentation tasks. The controller is optimised by rewarding positive task-specific performance gain, within a Markov decision process (MDP) environment that also optimises the task predictor. In this work, the task predictor is a segmentation network. A meta-reinforcement learning algorithm is proposed with multiple MDPs, such that the pre-trained controller can be adapted to a new MDP that contains data from different institutes and/or requires segmentation of different organs or structures within the abdomen. We present experimental results using multiple CT datasets from more than one thousand patients, with segmentation tasks of nine different abdominal organs, to demonstrate the efficacy of the learnt prioritisation controller function and its cross-institute and cross-organ adaptability. We show that the proposed adaptable prioritisation metric yields converging segmentation accuracy for the novel class of kidney, unseen in training, using between approximately 40\% to 60\% of labels otherwise required with other heuristic or random prioritisation metrics. For clinical datasets of limited size, the proposed adaptable prioritisation offers a performance improvement of 22.6\% and 10.2\% in Dice score, for tasks of kidney and liver vessel segmentation, respectively, compared to random prioritisation and alternative active sampling strategies.
CVMar 1Code
Flow Matching-enabled Test-Time Refinement for Unsupervised Cardiac MR RegistrationYunguan Fu, Wenjia Bai, Wen Yan et al.
Diffusion-based unsupervised image registration has been explored for cardiac cine MR, but expensive multi-step inference limits practical use. We propose FlowReg, a flow-matching framework in displacement field space that achieves strong registration in as few as two steps and supports further refinement with more steps. FlowReg uses warmup-reflow training: a single-step network first acts as a teacher, then a student learns to refine from arbitrary intermediate states, removing the need for a pre-trained model as in existing methods. An Initial Guess strategy feeds back the model prediction as the next starting point, improving refinement from step two onward. On ACDC and MM2 across six tasks (including cross-dataset generalization), FlowReg outperforms the state of the art on five tasks (+0.6% mean Dice score on average), with the largest gain in the left ventricle (+1.09%), and reduces LVEF estimation error on all six tasks (-2.58 percentage points), using only 0.7% extra parameters and no segmentation labels. Code is available at https://github.com/mathpluscode/FlowReg.
IVMar 12, 2022
VAFO-Loss: VAscular Feature Optimised Loss Function for Retinal Artery/Vein SegmentationYukun Zhou, Moucheng Xu, Yipeng Hu et al.
Estimating clinically-relevant vascular features following vessel segmentation is a standard pipeline for retinal vessel analysis, which provides potential ocular biomarkers for both ophthalmic disease and systemic disease. In this work, we integrate these clinical features into a novel vascular feature optimised loss function (VAFO-Loss), in order to regularise networks to produce segmentation maps, with which more accurate vascular features can be derived. Two common vascular features, vessel density and fractal dimension, are identified to be sensitive to intra-segment misclassification, which is a well-recognised problem in multi-class artery/vein segmentation particularly hindering the estimation of these vascular features. Thus we encode these two features into VAFO-Loss. We first show that incorporating our end-to-end VAFO-Loss in standard segmentation networks indeed improves vascular feature estimation, yielding quantitative improvement in stroke incidence prediction, a clinical downstream task. We also report a technically interesting finding that the trained segmentation network, albeit biased by the feature optimised loss VAFO-Loss, shows statistically significant improvement in segmentation metrics, compared to those trained with other state-of-the-art segmentation losses.
CVAug 22, 2023
Boundary-RL: Reinforcement Learning for Weakly-Supervised Prostate Segmentation in TRUS ImagesWeixi Yi, Vasilis Stavrinides, Zachary M. C. Baum et al.
We propose Boundary-RL, a novel weakly supervised segmentation method that utilises only patch-level labels for training. We envision the segmentation as a boundary detection problem, rather than a pixel-level classification as in previous works. This outlook on segmentation may allow for boundary delineation under challenging scenarios such as where noise artefacts may be present within the region-of-interest (ROI) boundaries, where traditional pixel-level classification-based weakly supervised methods may not be able to effectively segment the ROI. Particularly of interest, ultrasound images, where intensity values represent acoustic impedance differences between boundaries, may also benefit from the boundary delineation approach. Our method uses reinforcement learning to train a controller function to localise boundaries of ROIs using a reward derived from a pre-trained boundary-presence classifier. The classifier indicates when an object boundary is encountered within a patch, as the controller modifies the patch location in a sequential Markov decision process. The classifier itself is trained using only binary patch-level labels of object presence, which are the only labels used during training of the entire boundary delineation framework, and serves as a weak signal to inform the boundary delineation. The use of a controller function ensures that a sliding window over the entire image is not necessary. It also prevents possible false-positive or -negative cases by minimising number of patches passed to the boundary-presence classifier. We evaluate our proposed approach for a clinically relevant task of prostate gland segmentation on trans-rectal ultrasound images. We show improved performance compared to other tested weakly supervised methods, using the same labels e.g., multiple instance learning.
IVAug 22, 2022
Optimising Chest X-Rays for Image Analysis by Identifying and Removing Confounding FactorsShahab Aslani, Watjana Lilaonitkul, Vaishnavi Gnanananthan et al.
During the COVID-19 pandemic, the sheer volume of imaging performed in an emergency setting for COVID-19 diagnosis has resulted in a wide variability of clinical CXR acquisitions. This variation is seen in the CXR projections used, image annotations added and in the inspiratory effort and degree of rotation of clinical images. The image analysis community has attempted to ease the burden on overstretched radiology departments during the pandemic by developing automated COVID-19 diagnostic algorithms, the input for which has been CXR imaging. Large publicly available CXR datasets have been leveraged to improve deep learning algorithms for COVID-19 diagnosis. Yet the variable quality of clinically-acquired CXRs within publicly available datasets could have a profound effect on algorithm performance. COVID-19 diagnosis may be inferred by an algorithm from non-anatomical features on an image such as image labels. These imaging shortcuts may be dataset-specific and limit the generalisability of AI systems. Understanding and correcting key potential biases in CXR images is therefore an essential first step prior to CXR image analysis. In this study, we propose a simple and effective step-wise approach to pre-processing a COVID-19 chest X-ray dataset to remove undesired biases. We perform ablation studies to show the impact of each individual step. The results suggest that using our proposed pipeline could increase accuracy of the baseline COVID-19 detection algorithm by up to 13%.
CVAug 20, 2023
Privileged Anatomical and Protocol Discrimination in Trackerless 3D Ultrasound ReconstructionQi Li, Ziyi Shen, Qian Li et al.
Three-dimensional (3D) freehand ultrasound (US) reconstruction without using any additional external tracking device has seen recent advances with deep neural networks (DNNs). In this paper, we first investigated two identified contributing factors of the learned inter-frame correlation that enable the DNN-based reconstruction: anatomy and protocol. We propose to incorporate the ability to represent these two factors - readily available during training - as the privileged information to improve existing DNN-based methods. This is implemented in a new multi-task method, where the anatomical and protocol discrimination are used as auxiliary tasks. We further develop a differentiable network architecture to optimise the branching location of these auxiliary tasks, which controls the ratio between shared and task-specific network parameters, for maximising the benefits from the two auxiliary tasks. Experimental results, on a dataset with 38 forearms of 19 volunteers acquired with 6 different scanning protocols, show that 1) both anatomical and protocol variances are enabling factors for DNN-based US reconstruction; 2) learning how to discriminate different subjects (anatomical variance) and predefined types of scanning paths (protocol variance) both significantly improve frame prediction accuracy, volume reconstruction overlap, accumulated tracking error and final drift, using the proposed algorithm.
CVJan 14Code
Radiomics-Integrated Deep Learning with Hierarchical Loss for Osteosarcoma Histology ClassificationYaxi Chen, Zi Ye, Shaheer U. Saeed et al.
Osteosarcoma (OS) is an aggressive primary bone malignancy. Accurate histopathological assessment of viable versus non-viable tumor regions after neoadjuvant chemotherapy is critical for prognosis and treatment planning, yet manual evaluation remains labor-intensive, subjective, and prone to inter-observer variability. Recent advances in digital pathology have enabled automated necrosis quantification. Evaluating on test data, independently sampled on patient-level, revealed that the deep learning model performance dropped significantly from the tile-level generalization ability reported in previous studies. First, this work proposes the use of radiomic features as additional input in model training. We show that, despite that they are derived from the images, such a multimodal input effectively improved the classification performance, in addition to its added benefits in interpretability. Second, this work proposes to optimize two binary classification tasks with hierarchical classes (i.e. tumor-vs-non-tumor and viable-vs-non-viable), as opposed to the alternative ``flat'' three-class classification task (i.e. non-tumor, non-viable tumor, viable tumor), thereby enabling a hierarchical loss. We show that such a hierarchical loss, with trainable weightings between the two tasks, the per-class performance can be improved significantly. Using the TCIA OS Tumor Assessment dataset, we experimentally demonstrate the benefits from each of the proposed new approaches and their combination, setting a what we consider new state-of-the-art performance on this open dataset for this application. Code and trained models: https://github.com/YaxiiC/RadiomicsOS.git.
IVJul 8, 2024
Poisson Ordinal Network for Gleason Group Estimation Using Bi-Parametric MRIYinsong Xu, Yipei Wang, Ziyi Shen et al.
The Gleason groups serve as the primary histological grading system for prostate cancer, providing crucial insights into the cancer's potential for growth and metastasis. In clinical practice, pathologists determine the Gleason groups based on specimens obtained from ultrasound-guided biopsies. In this study, we investigate the feasibility of directly estimating the Gleason groups from MRI scans to reduce otherwise required biopsies. We identify two characteristics of this task, ordinality and the resulting dependent yet unknown variances between Gleason groups. In addition to the inter- / intra- observer variability in a multi-step Gleason scoring process based on the interpretation of Gleason patterns, our MR-based prediction is also subject to specimen sampling variance and, to a lesser degree, varying MR imaging protocols. To address this challenge, we propose a novel Poisson ordinal network (PON). PONs model the prediction using a Poisson distribution and leverages Poisson encoding and Poisson focal loss to capture a learnable dependency between ordinal classes (here, Gleason groups), rather than relying solely on the numerical ground-truth (e.g. Gleason Groups 1-5 or Gleason Scores 6-10). To improve this modelling efficacy, PONs also employ contrastive learning with a memory bank to regularise intra-class variance, decoupling the memory requirement of contrast learning from the batch size. Experimental results based on the images labelled by saturation biopsies from 265 prior-biopsy-blind patients, across two tasks demonstrate the superiority and effectiveness of our proposed method.
LGJul 21, 2022
Strategising template-guided needle placement for MR-targeted prostate biopsyIani JMB Gayo, Shaheer U. Saeed, Dean C. Barratt et al.
Clinically significant prostate cancer has a better chance to be sampled during ultrasound-guided biopsy procedures, if suspected lesions found in pre-operative magnetic resonance (MR) images are used as targets. However, the diagnostic accuracy of the biopsy procedure is limited by the operator-dependent skills and experience in sampling the targets, a sequential decision making process that involves navigating an ultrasound probe and placing a series of sampling needles for potentially multiple targets. This work aims to learn a reinforcement learning (RL) policy that optimises the actions of continuous positioning of 2D ultrasound views and biopsy needles with respect to a guiding template, such that the MR targets can be sampled efficiently and sufficiently. We first formulate the task as a Markov decision process (MDP) and construct an environment that allows the targeting actions to be performed virtually for individual patients, based on their anatomy and lesions derived from MR images. A patient-specific policy can thus be optimised, before each biopsy procedure, by rewarding positive sampling in the MDP environment. Experiment results from fifty four prostate cancer patients show that the proposed RL-learned policies obtained a mean hit rate of 93% and an average cancer core length of 11 mm, which compared favourably to two alternative baseline strategies designed by humans, without hand-engineered rewards that directly maximise these clinically relevant metrics. Perhaps more interestingly, it is found that the RL agents learned strategies that were adaptive to the lesion size, where spread of the needles was prioritised for smaller lesions. Such a strategy has not been previously reported or commonly adopted in clinical practice, but led to an overall superior targeting performance when compared with intuitively designed strategies.
CVJul 22, 2022
Meta-Registration: Learning Test-Time Optimization for Single-Pair Image RegistrationZachary MC Baum, Yipeng Hu, Dean C Barratt
Neural networks have been proposed for medical image registration by learning, with a substantial amount of training data, the optimal transformations between image pairs. These trained networks can further be optimized on a single pair of test images - known as test-time optimization. This work formulates image registration as a meta-learning algorithm. Such networks can be trained by aligning the training image pairs while simultaneously improving test-time optimization efficacy; tasks which were previously considered two independent training and optimization processes. The proposed meta-registration is hypothesized to maximize the efficiency and effectiveness of the test-time optimization in the "outer" meta-optimization of the networks. For image guidance applications that often are time-critical yet limited in training data, the potentially gained speed and accuracy are compared with classical registration algorithms, registration networks without meta-learning, and single-pair optimization without test-time optimization data. Experiments are presented in this paper using clinical transrectal ultrasound image data from 108 prostate cancer patients. These experiments demonstrate the effectiveness of a meta-registration protocol, which yields significantly improved performance relative to existing learning-based methods. Furthermore, the meta-registration achieves comparable results to classical iterative methods in a fraction of the time, owing to its rapid test-time optimization process.
CVJul 22, 2022
Learning Generalized Non-Rigid Multimodal Biomedical Image Registration from Generic Point Set DataZachary MC Baum, Tamas Ungi, Christopher Schlenger et al.
Free Point Transformer (FPT) has been proposed as a data-driven, non-rigid point set registration approach using deep neural networks. As FPT does not assume constraints based on point vicinity or correspondence, it may be trained simply and in a flexible manner by minimizing an unsupervised loss based on the Chamfer Distance. This makes FPT amenable to real-world medical imaging applications where ground-truth deformations may be infeasible to obtain, or in scenarios where only a varying degree of completeness in the point sets to be aligned is available. To test the limit of the correspondence finding ability of FPT and its dependency on training data sets, this work explores the generalizability of the FPT from well-curated non-medical data sets to medical imaging data sets. First, we train FPT on the ModelNet40 dataset to demonstrate its effectiveness and the superior registration performance of FPT over iterative and learning-based point set registration methods. Second, we demonstrate superior performance in rigid and non-rigid registration and robustness to missing data. Last, we highlight the interesting generalizability of the ModelNet-trained FPT by registering reconstructed freehand ultrasound scans of the spine and generic spine models without additional training, whereby the average difference to the ground truth curvatures is 1.3 degrees, across 13 patients.
66.6IVMar 15
On the Degrees of Freedom of Gridded Control Points in Learning-Based Medical Image RegistrationWen Yan, Qianye Yang, Yipei Wang et al.
Many registration problems are ill-posed in homogeneous or noisy regions, and dense voxel-wise decoders can be unnecessarily high-dimensional. A sparse control-point parameterisation provides a compact, smooth deformation representation while reducing memory and improving stability. This work investigates the required control points for learning-based registration network development. We present GridReg, a learning-based registration framework that replaces dense voxel-wise decoding with displacement predictions at a sparse grid of control points. This design substantially cuts the parameter count and memory while retaining registration accuracy. Multiscale 3D encoder feature maps are flattened into a 1D token sequence with positional encoding to retain spatial context. The model then predicts a sparse gridded deformation field using a cross-attention module. We further introduce grid-adaptive training, enabling an adaptive model to operate at multiple grid sizes at inference without retraining. This work quantitatively demonstrates the benefits of using sparse grids. Using three data sets for registering prostate gland, pelvic organs and neurological structures, the results suggested a significant improvement with the usage of grid-controled displacement field. Alternatively, the superior registration performance was obtained using the proposed approach, with a similar or less computational cost, compared with existing algorithms that predict DDFs or displacements sampled on scattered key points.
CVJan 22
Understanding the Transfer Limits of Vision Foundation ModelsShiqi Huang, Yipei Wang, Natasha Thorley et al.
Foundation models leverage large-scale pretraining to capture extensive knowledge, demonstrating generalization in a wide range of language tasks. By comparison, vision foundation models (VFMs) often exhibit uneven improvements across downstream tasks, despite substantial computational investment. We postulate that this limitation arises from a mismatch between pretraining objectives and the demands of downstream vision-and-imaging tasks. Pretraining strategies like masked image reconstruction or contrastive learning shape representations for tasks such as recovery of generic visual patterns or global semantic structures, which may not align with the task-specific requirements of downstream applications including segmentation, classification, or image synthesis. To investigate this in a concrete real-world clinical area, we assess two VFMs, a reconstruction-focused MAE-based model (ProFound) and a contrastive-learning-based model (ProViCNet), on five prostate multiparametric MR imaging tasks, examining how such task alignment influences transfer performance, i.e., from pretraining to fine-tuning. Our findings indicate that better alignment between pretraining and downstream tasks, measured by simple divergence metrics such as maximum-mean-discrepancy (MMD) between the same features before and after fine-tuning, correlates with greater performance improvements and faster convergence, emphasizing the importance of designing and analyzing pretraining objectives with downstream applicability in mind.
CVMar 4
ProFound: A moderate-sized vision foundation model for multi-task prostate imagingYipei Wang, Yinsong Xu, Weixi Yi et al.
Many diagnostic and therapeutic clinical tasks for prostate cancer increasingly rely on multi-parametric MRI. Automating these tasks is challenging because they necessitate expert interpretations, which are difficult to scale to capitalise on modern deep learning. Although modern automated systems achieve expert-level performance in isolated tasks, their general clinical utility remains limited by the requirement of large task-specific labelled datasets. In this paper, we present ProFound, a domain-specialised vision foundation model for volumetric prostate mpMRI. ProFound is pre-trained using several variants of self-supervised approaches on a diverse, multi-institutional collection of 5,000 patients, with a total of over 22,000 unique 3D MRI volumes (over 1,800,000 2D image slices). We conducted a systematic evaluation of ProFound across a broad spectrum of $11$ downstream clinical tasks on over 3,000 independent patients, including prostate cancer detection, Gleason grading, lesion localisation, gland volume estimation, zonal and surrounding structure segmentation. Experimental results demonstrate that finetuned ProFound consistently outperforms or remains competitive with state-of-the-art specialised models and existing medical vision foundation models trained/finetuned on the same data.
IVJul 22, 2022
Rapid Lung Ultrasound COVID-19 Severity Scoring with Resource-Efficient Deep Feature ExtractionPierre Raillard, Lorenzo Cristoni, Andrew Walden et al.
Artificial intelligence-based analysis of lung ultrasound imaging has been demonstrated as an effective technique for rapid diagnostic decision support throughout the COVID-19 pandemic. However, such techniques can require days- or weeks-long training processes and hyper-parameter tuning to develop intelligent deep learning image analysis models. This work focuses on leveraging 'off-the-shelf' pre-trained models as deep feature extractors for scoring disease severity with minimal training time. We propose using pre-trained initializations of existing methods ahead of simple and compact neural networks to reduce reliance on computational capacity. This reduction of computational capacity is of critical importance in time-limited or resource-constrained circumstances, such as the early stages of a pandemic. On a dataset of 49 patients, comprising over 20,000 images, we demonstrate that the use of existing methods as feature extractors results in the effective classification of COVID-19-related pneumonia severity while requiring only minutes of training time. Our methods can achieve an accuracy of over 0.93 on a 4-level severity score scale and provides comparable per-patient region and global scores compared to expert annotated ground truths. These results demonstrate the capability for rapid deployment and use of such minimally-adapted methods for progress monitoring, patient stratification and management in clinical practice for COVID-19 patients, and potentially in other respiratory diseases.
25.7CVMar 15
Deep EM with Hierarchical Latent Label Modelling for Multi-Site Prostate Lesion SegmentationWen Yan, Yipei Wang, Shiqi Huang et al.
Label variability is a major challenge for prostate lesion segmentation. In multi-site datasets, annotations often reflect centre-specific contouring protocols, causing segmentation networks to overfit to local styles and generalise poorly to unseen sites in inference. We treat each observed annotation as a noisy observation of an underlying latent 'clean' lesion mask, and propose a hierarchical expectation-maximisation (HierEM) framework that alternates between: (1) inferring a voxel-wise posterior distribution over the latent mask, and (2) training a CNN using this posterior as a soft target and estimate site-specific sensitivity and specificity under a hierarchical prior. This hierarchical prior decomposes label-quality into a global mean with site- and case-level deviations, reducing site-specific bias by penalising the likelihood term contributed only by site deviations. Experiments on three cohorts demonstrate that the proposed hierarchical EM framework enhances cross-site generalisation compared to state-of-the-art methods. For pooled-dataset evaluation, the per-site mean DSC ranges from 29.50% to 39.69%; for leave-one-site-out generalisation, it ranges from 27.91% to 32.67%, yielding statistically significant improvements over comparison methods (p<0.039). The method also produces interpretable per-site latent label-quality estimates (sensitivity alpha ranges from 31.5% to 47.3% at specificity beta approximates 0.99), supporting post-hoc analyses of cross-site annotation variability. These results indicate that explicitly modelling site-dependent annotation can improve cross-site generalisation.
IVMar 12, 2023
Spatial Correspondence between Graph Neural Network-Segmented ImagesQian Li, Yunguan Fu, Qianye Yang et al.
Graph neural networks (GNNs) have been proposed for medical image segmentation, by predicting anatomical structures represented by graphs of vertices and edges. One such type of graph is predefined with fixed size and connectivity to represent a reference of anatomical regions of interest, thus known as templates. This work explores the potentials in these GNNs with common topology for establishing spatial correspondence, implicitly maintained during segmenting two or more images. With an example application of registering local vertebral sub-regions found in CT images, our experimental results showed that the GNN-based segmentation is capable of accurate and reliable localization of the same interventionally interesting structures between images, not limited to the segmentation classes. The reported average target registration errors of 2.2$\pm$1.3 mm and 2.7$\pm$1.4 mm, for aligning holdout test images with a reference and for aligning two test images, respectively, were by a considerable margin lower than those from the tested non-learning and learning-based registration algorithms. Further ablation studies assess the contributions towards the registration performance, from individual components in the originally segmentation-purposed network and its training algorithm. The results highlight that the proposed segmentation-in-lieu-of-registration approach shares methodological similarities with existing registration methods, such as the use of displacement smoothness constraint and point distance minimization albeit on non-grid graphs, which interestingly yielded benefits for both segmentation and registration. We, therefore, conclude that the template-based GNN segmentation can effectively establish spatial correspondence in our application, without any other dedicated registration algorithms.
CVDec 30, 2025
Learning to learn skill assessment for fetal ultrasound scanningYipei Wang, Qianye Yang, Lior Drukker et al.
Traditionally, ultrasound skill assessment has relied on expert supervision and feedback, a process known for its subjectivity and time-intensive nature. Previous works on quantitative and automated skill assessment have predominantly employed supervised learning methods, often limiting the analysis to predetermined or assumed factors considered influential in determining skill levels. In this work, we propose a novel bi-level optimisation framework that assesses fetal ultrasound skills by how well a task is performed on the acquired fetal ultrasound images, without using manually predefined skill ratings. The framework consists of a clinical task predictor and a skill predictor, which are optimised jointly by refining the two networks simultaneously. We validate the proposed method on real-world clinical ultrasound videos of scanning the fetal head. The results demonstrate the feasibility of predicting ultrasound skills by the proposed framework, which quantifies optimised task performance as a skill indicator.
LGOct 15, 2024Code
Subspace Optimization for Large Language Models with Convergence GuaranteesYutong He, Pengrui Li, Yipeng Hu et al.
Subspace optimization algorithms, such as GaLore (Zhao et al., 2024), have gained attention for pre-training and fine-tuning large language models (LLMs) due to their memory efficiency. However, their convergence guarantees remain unclear, particularly in stochastic settings. In this paper, we reveal that GaLore does not always converge to the optimal solution and provide an explicit counterexample to support this finding. We further explore the conditions under which GaLore achieves convergence, showing that it does so when either (i) a sufficiently large mini-batch size is used or (ii) the gradient noise is isotropic. More significantly, we introduce GoLore (Gradient random Low-rank projection), a novel variant of GaLore that provably converges in typical stochastic settings, even with standard batch sizes. Our convergence analysis extends naturally to other subspace optimization algorithms. Finally, we empirically validate our theoretical results and thoroughly test the proposed mechanisms. Codes are available at https://github.com/pkumelon/Golore.
CVMar 2
Retrieving Patient-Specific Radiomic Feature Sets for Transparent Knee MRI AssessmentYaxi Chen, Simin Ni, Jingjing Zhang et al.
Classical radiomic features are designed to quantify image appearance and intensity patterns. Compared with end-to-end deep learning (DL) models trained for disease classification, radiomics pipelines with low-dimensional parametric classifiers offer enhanced transparency and interpretability, yet often underperform because of the reliance on population-level predefined feature sets. Recent work on adaptive radiomics uses DL to predict feature weights over a radiomic pool, then thresholds these weights to retain the top-k features from large radiomic pool F (often ~10^3). However, such marginal ranking can over-admit redundant descriptors and overlook complementary feature interactions. We propose a patient-specific feature-set selection framework that predicts a single compact feature set per subject, targeting complementary and diverse evidence rather than marginal top-k features. To overcome the intractable combinatorial search space of F choose k features, our method utilizes a 2-stage retrieval strategy: randomly sample diverse candidate feature sets, then rank these sets with a learned scoring function to select a high-performing feature set for the specific patient. The system consists of a feature-set scorer, and a classifier that performs the final diagnosis. We empirically show that the proposed two-stage retrieval approximates the original exhaustive all k-feature selection. Validating on tasks including ACL tear detection and KL grading for osteoarthritis, the experimental results achieve diagnostic performance, outperforming the top-k approach with the same k values, and competitive with end-to-end DL models while maintaining high transparency. The model generates auditable feature sets that link clinical outcomes to specific anatomical regions and radiomic families, allowing clinicians to inspect which anatomical structures and quantitative descriptors drive the prediction.
CVJan 13
Interpretability and Individuality in Knee MRI: Patient-Specific Radiomic Fingerprint with Reconstructed Healthy PersonasYaxi Chen, Simin Ni, Shuai Li et al.
For automated assessment of knee MRI scans, both accuracy and interpretability are essential for clinical use and adoption. Traditional radiomics rely on predefined features chosen at the population level; while more interpretable, they are often too restrictive to capture patient-specific variability and can underperform end-to-end deep learning (DL). To address this, we propose two complementary strategies that bring individuality and interpretability: radiomic fingerprints and healthy personas. First, a radiomic fingerprint is a dynamically constructed, patient-specific feature set derived from MRI. Instead of applying a uniform population-level signature, our model predicts feature relevance from a pool of candidate features and selects only those most predictive for each patient, while maintaining feature-level interpretability. This fingerprint can be viewed as a latent-variable model of feature usage, where an image-conditioned predictor estimates usage probabilities and a transparent logistic regression with global coefficients performs classification. Second, a healthy persona synthesises a pathology-free baseline for each patient using a diffusion model trained to reconstruct healthy knee MRIs. Comparing features extracted from pathological images against their personas highlights deviations from normal anatomy, enabling intuitive, case-specific explanations of disease manifestations. We systematically compare fingerprints, personas, and their combination across three clinical tasks. Experimental results show that both approaches yield performance comparable to or surpassing state-of-the-art DL models, while supporting interpretability at multiple levels. Case studies further illustrate how these perspectives facilitate human-explainable biomarker discovery and pathology localisation.
CVOct 17, 2024Code
SAMReg: SAM-enabled Image Registration with ROI-based CorrespondenceShiqi Huang, Tingfa Xu, Ziyi Shen et al.
This paper describes a new spatial correspondence representation based on paired regions-of-interest (ROIs), for medical image registration. The distinct properties of the proposed ROI-based correspondence are discussed, in the context of potential benefits in clinical applications following image registration, compared with alternative correspondence-representing approaches, such as those based on sampled displacements and spatial transformation functions. These benefits include a clear connection between learning-based image registration and segmentation, which in turn motivates two cases of image registration approaches using (pre-)trained segmentation networks. Based on the segment anything model (SAM), a vision foundation model for segmentation, we develop a new registration algorithm SAMReg, which does not require any training (or training data), gradient-based fine-tuning or prompt engineering. The proposed SAMReg models are evaluated across five real-world applications, including intra-subject registration tasks with cardiac MR and lung CT, challenging inter-subject registration scenarios with prostate MR and retinal imaging, and an additional evaluation with a non-clinical example with aerial image registration. The proposed methods outperform both intensity-based iterative algorithms and DDF-predicting learning-based networks across tested metrics including Dice and target registration errors on anatomical structures, and further demonstrates competitive performance compared to weakly-supervised registration approaches that rely on fully-segmented training data. Open source code and examples are available at: https://github.com/sqhuang0103/SAMReg.git.
IVMay 31, 2025Code
A versatile foundation model for cine cardiac magnetic resonance image analysis tasksYunguan Fu, Wenjia Bai, Weixi Yi et al.
Here we present a versatile foundation model that can perform a range of clinically-relevant image analysis tasks, including segmentation, landmark localisation, diagnosis, and prognostication. A multi-view convolution-transformer masked autoencoder, named as CineMA, was trained on 15 million cine images from 74,916 subjects. The model was validated on multiple image analysis tasks and compared to existing models on >4,500 images from eight independent datasets with diverse population characteristics, representing the largest benchmark study for cine CMR so far. CineMA consistently outperformed conventional convolutional neural networks (CNNs) in delineating ventricular boundaries and estimating ejection fraction, a key measure of cardiac function. The improved performance was preserved, even when the model only used half of fine-tuning data. CineMA also surpassed CNNs in disease detection and matched their performance in long-axis function measurement. Interestingly, we found that CineMA can also detect cardiac changes in systemic diseases, such as diabetes, hypertension and cancer, and can also predict mortality. Finally, we assessed model fairness and demonstrated consistent model performance across demographic subgroups. These findings highlight CineMA's accuracy, learning efficiency, adaptability, and fairness, underscoring its potential as a foundation model for automated cardiac image analysis to support clinical workflow and cardiovascular research. All training and inference code and models are made publicly available at https://github.com/mathpluscode/CineMA.
CVMar 17, 2025Code
Patient-specific radiomic feature selection with reconstructed healthy persona of knee MR imagesYaxi Chen, Simin Ni, Aleksandra Ivanova et al.
Classical radiomic features have been designed to describe image appearance and intensity patterns. These features are directly interpretable and readily understood by radiologists. Compared with end-to-end deep learning (DL) models, lower dimensional parametric models that use such radiomic features offer enhanced interpretability but lower comparative performance in clinical tasks. In this study, we propose an approach where a standard logistic regression model performance is substantially improved by learning to select radiomic features for individual patients, from a pool of candidate features. This approach has potentials to maintain the interpretability of such approaches while offering comparable performance to DL. We also propose to expand the feature pool by generating a patient-specific healthy persona via mask-inpainting using a denoising diffusion model trained on healthy subjects. Such a pathology-free baseline feature set allows further opportunity in novel feature discovery and improved condition classification. We demonstrate our method on multiple clinical tasks of classifying general abnormalities, anterior cruciate ligament tears, and meniscus tears. Experimental results demonstrate that our approach achieved comparable or even superior performance than state-of-the-art DL approaches while offering added interpretability by using radiomic features extracted from images and supplemented by generating healthy personas. Example clinical cases are discussed in-depth to demonstrate the intepretability-enabled utilities such as human-explainable feature discovery and patient-specific location/view selection. These findings highlight the potentials of the combination of subject-specific feature selection with generative models in augmenting radiomic analysis for more interpretable decision-making. The codes are available at: https://github.com/YaxiiC/RadiomicsPersona.git
CVJul 17, 2025Code
Analysis of Image-and-Text Uncertainty Propagation in Multimodal Large Language Models with Cardiac MR-Based ApplicationsYucheng Tang, Yunguan Fu, Weixi Yi et al.
Multimodal large language models (MLLMs) can process and integrate information from multimodality sources, such as text and images. However, interrelationship among input modalities, uncertainties due to individual uni-modal data and potential clinical applications following such an uncertainty decomposition are yet fully understood in the context of large-scale MLLMs. In this work, we propose a multimodal uncertainty propagation model (MUPM) based on uncertainty propagation, to characterise the relationship among the uncertainties arising from image-only, text-only, and joint image-text variations in MLLM inputs. Using real clinical data consisting of cardiac MR scans and digital health records, we describe that MUPMs can be optimised robustly with a few samples. We then show that the fitted MUPMs are generalisable across different input data distributions and, perhaps surprisingly, across different downstream tasks. Such a transferability may be explained by the shared pretraining, comparatively light MLLM fine-tuning, along with the low-dimensional nature of the MUPMs. More importantly, this learned transferability, quantifying the relationship between these uncertainties, led to direct clinical applications in which uncertainties may be estimated and thus analysed robustly for varying data or even a novel set of cardiac disease prediction tasks. In addition, we show experimentally the efficiency in multimodal data required for estimating the overall uncertainty and its ability to identify redundant factors, both of which are considered practical yet clinically useful applications with the proposed MUPMs. Codes are available at https://github.com/yucheng722/MUPM.
IVJul 2, 2025Code
A computationally frugal open-source foundation model for thoracic disease detection in lung cancer screening programsNiccolò McConnell, Pardeep Vasudev, Daisuke Yamada et al.
Low-dose computed tomography (LDCT) imaging employed in lung cancer screening (LCS) programs is increasing in uptake worldwide. LCS programs herald a generational opportunity to simultaneously detect cancer and non-cancer-related early-stage lung disease. Yet these efforts are hampered by a shortage of radiologists to interpret scans at scale. Here, we present TANGERINE, a computationally frugal, open-source vision foundation model for volumetric LDCT analysis. Designed for broad accessibility and rapid adaptation, TANGERINE can be fine-tuned off the shelf for a wide range of disease-specific tasks with limited computational resources and training data. Relative to models trained from scratch, TANGERINE demonstrates fast convergence during fine-tuning, thereby requiring significantly fewer GPU hours, and displays strong label efficiency, achieving comparable or superior performance with a fraction of fine-tuning data. Pretrained using self-supervised learning on over 98,000 thoracic LDCTs, including the UK's largest LCS initiative to date and 27 public datasets, TANGERINE achieves state-of-the-art performance across 14 disease classification tasks, including lung cancer and multiple respiratory diseases, while generalising robustly across diverse clinical centres. By extending a masked autoencoder framework to 3D imaging, TANGERINE offers a scalable solution for LDCT analysis, departing from recent closed, resource-intensive models by combining architectural simplicity, public availability, and modest computational requirements. Its accessible, open-source lightweight design lays the foundation for rapid integration into next-generation medical imaging tools that could transform LCS initiatives, allowing them to pivot from a singular focus on lung cancer detection to comprehensive respiratory disease management in high-risk populations.
CVFeb 5, 2025Code
Tell2Reg: Establishing spatial correspondence between images by the same language promptsWen Yan, Qianye Yang, Shiqi Huang et al.
Spatial correspondence can be represented by pairs of segmented regions, such that the image registration networks aim to segment corresponding regions rather than predicting displacement fields or transformation parameters. In this work, we show that such a corresponding region pair can be predicted by the same language prompt on two different images using the pre-trained large multimodal models based on GroundingDINO and SAM. This enables a fully automated and training-free registration algorithm, potentially generalisable to a wide range of image registration tasks. In this paper, we present experimental results using one of the challenging tasks, registering inter-subject prostate MR images, which involves both highly variable intensity and morphology between patients. Tell2Reg is training-free, eliminating the need for costly and time-consuming data curation and labelling that was previously required for this registration task. This approach outperforms unsupervised learning-based registration methods tested, and has a performance comparable to weakly-supervised methods. Additional qualitative results are also presented to suggest that, for the first time, there is a potential correlation between language semantics and spatial correspondence, including the spatial invariance in language-prompted regions and the difference in language prompts between the obtained local and global correspondences. Code is available at https://github.com/yanwenCi/Tell2Reg.git.
CVMay 2, 2023Code
Expectation Maximization Pseudo LabelsMoucheng Xu, Yukun Zhou, Chen Jin et al.
In this paper, we study pseudo-labelling. Pseudo-labelling employs raw inferences on unlabelled data as pseudo-labels for self-training. We elucidate the empirical successes of pseudo-labelling by establishing a link between this technique and the Expectation Maximisation algorithm. Through this, we realise that the original pseudo-labelling serves as an empirical estimation of its more comprehensive underlying formulation. Following this insight, we present a full generalisation of pseudo-labels under Bayes' theorem, termed Bayesian Pseudo Labels. Subsequently, we introduce a variational approach to generate these Bayesian Pseudo Labels, involving the learning of a threshold to automatically select high-quality pseudo labels. In the remainder of the paper, we showcase the applications of pseudo-labelling and its generalised form, Bayesian Pseudo-Labelling, in the semi-supervised segmentation of medical images. Specifically, we focus on: 1) 3D binary segmentation of lung vessels from CT volumes; 2) 2D multi-class segmentation of brain tumours from MRI volumes; 3) 3D binary segmentation of whole brain tumours from MRI volumes; and 4) 3D binary segmentation of prostate from MRI volumes. We further demonstrate that pseudo-labels can enhance the robustness of the learned representations. The code is released in the following GitHub repository: https://github.com/moucheng2017/EMSSL
MED-PHNov 19, 2021Code
Evaluation of automated airway morphological quantification for assessing fibrosing lung diseaseAshkan Pakzad, Wing Keung Cheung, Kin Quan et al.
Abnormal airway dilatation, termed traction bronchiectasis, is a typical feature of idiopathic pulmonary fibrosis (IPF). Volumetric computed tomography (CT) imaging captures the loss of normal airway tapering in IPF. We postulated that automated quantification of airway abnormalities could provide estimates of IPF disease extent and severity. We propose AirQuant, an automated computational pipeline that systematically parcellates the airway tree into its lobes and generational branches from a deep learning based airway segmentation, deriving airway structural measures from chest CT. Importantly, AirQuant prevents the occurrence of spurious airway branches by thick wave propagation and removes loops in the airway-tree by graph search, overcoming limitations of existing airway skeletonisation algorithms. Tapering between airway segments (intertapering) and airway tortuosity computed by AirQuant were compared between 14 healthy participants and 14 IPF patients. Airway intertapering was significantly reduced in IPF patients, and airway tortuosity was significantly increased when compared to healthy controls. Differences were most marked in the lower lobes, conforming to the typical distribution of IPF-related damage. AirQuant is an open-source pipeline that avoids limitations of existing airway quantification algorithms and has clinical interpretability. Automated airway measurements may have potential as novel imaging biomarkers of IPF severity and disease extent.
CVJul 28, 2021Code
Development and evaluation of intraoperative ultrasound segmentation with negative image frames and multiple observer labelsLiam F Chalcroft, Jiongqi Qu, Sophie A Martin et al.
When developing deep neural networks for segmenting intraoperative ultrasound images, several practical issues are encountered frequently, such as the presence of ultrasound frames that do not contain regions of interest and the high variance in ground-truth labels. In this study, we evaluate the utility of a pre-screening classification network prior to the segmentation network. Experimental results demonstrate that such a classifier, minimising frame classification errors, was able to directly impact the number of false positive and false negative frames. Importantly, the segmentation accuracy on the classifier-selected frames, that would be segmented, remains comparable to or better than those from standalone segmentation networks. Interestingly, the efficacy of the pre-screening classifier was affected by the sampling methods for training labels from multiple observers, a seemingly independent problem. We show experimentally that a previously proposed approach, combining random sampling and consensus labels, may need to be adapted to perform well in our application. Furthermore, this work aims to share practical experience in developing a machine learning application that assists highly variable interventional imaging for prostate cancer patients, to present robust and reproducible open-source implementations, and to report a set of comprehensive results and analysis comparing these practical, yet important, options in a real-world clinical application.
IVApr 25, 2021Code
Learning to Address Intra-segment Misclassification in Retinal ImagingYukun Zhou, Moucheng Xu, Yipeng Hu et al.
Accurate multi-class segmentation is a long-standing challenge in medical imaging, especially in scenarios where classes share strong similarity. Segmenting retinal blood vessels in retinal photographs is one such scenario, in which arteries and veins need to be identified and differentiated from each other and from the background. Intra-segment misclassification, i.e. veins classified as arteries or vice versa, frequently occurs when arteries and veins intersect, whereas in binary retinal vessel segmentation, error rates are much lower. We thus propose a new approach that decomposes multi-class segmentation into multiple binary, followed by a binary-to-multi-class fusion network. The network merges representations of artery, vein, and multi-class feature maps, each of which are supervised by expert vessel annotation in adversarial training. A skip-connection based merging process explicitly maintains class-specific gradients to avoid gradient vanishing in deep layers, to favor the discriminative features. The results show that, our model respectively improves F1-score by 4.4\%, 5.1\%, and 4.2\% compared with three state-of-the-art deep learning based methods on DRIVE-AV, LES-AV, and HRF-AV data sets. Code: https://github.com/rmaphoh/Learning-AVSegmentation