CVJun 1, 2022
Dual-stream spatiotemporal networks with feature sharing for monitoring animals in the home cageEzechukwu I. Nwokedi, Rasneer S. Bains, Luc Bidaut et al.
This paper presents a spatiotemporal deep learning approach for mouse behavioural classification in the home-cage. Using a series of dual-stream architectures with assorted modifications to increase performance, we introduce a novel feature sharing approach that jointly processes the streams at regular intervals throughout the network. To investigate the efficacy of this approach, models were evaluated by dissociating the streams and training/testing in the same rigorous manner as the main classifiers. Using an annotated, publicly available dataset of a singly-housed mice, we achieve prediction accuracy of 86.47% using an ensemble of a Inception-based network and an attention-based network, both of which utilize this feature sharing. We also demonstrate through ablation studies that for all models, the feature-sharing architectures consistently perform better than conventional ones having separate streams. The best performing models were further evaluated on other activity datasets, both mouse and human. Future work will investigate the effectiveness of feature sharing to behavioural classification in the unsupervised anomaly detection domain.
CVMar 25
Causal Transfer in Medical Image AnalysisMohammed M. Abdelsamea, Daniel Tweneboah Anyimadu, Tasneem Selim et al.
Medical imaging models frequently fail when deployed across hospitals, scanners, populations, or imaging protocols due to domain shift, limiting their clinical reliability. While transfer learning and domain adaptation address such shifts statistically, they often rely on spurious correlations that break under changing conditions. On the other hand, causal inference provides a principled way to identify invariant mechanisms that remain stable across environments. This survey introduces and systematises Causal Transfer Learning (CTL) for medical image analysis. This paradigm integrates causal reasoning with cross-domain representation learning to enable robust and generalisable clinical AI. We frame domain shift as a causal problem and analyse how structural causal models, invariant risk minimisation, and counterfactual reasoning can be embedded within transfer learning pipelines. We studied spanning classification, segmentation, reconstruction, anomaly detection, and multimodal imaging, and organised them by task, shift type, and causal assumption. A unified taxonomy is proposed that connects causal frameworks and transfer mechanisms. We further summarise datasets, benchmarks, and empirical gains, highlighting when and why causal transfer outperforms correlation-based domain adaptation. Finally, we discuss how CTL supports fairness, robustness, and trustworthy deployment in multi-institutional and federated settings, and outline open challenges and research directions for clinically reliable medical imaging AI.
CVNov 4, 2022
Rethinking the transfer learning for FCN based polyp segmentation in colonoscopyYan Wen, Lei Zhang, Xiangli Meng et al.
Besides the complex nature of colonoscopy frames with intrinsic frame formation artefacts such as light reflections and the diversity of polyp types/shapes, the publicly available polyp segmentation training datasets are limited, small and imbalanced. In this case, the automated polyp segmentation using a deep neural network remains an open challenge due to the overfitting of training on small datasets. We proposed a simple yet effective polyp segmentation pipeline that couples the segmentation (FCN) and classification (CNN) tasks. We find the effectiveness of interactive weight transfer between dense and coarse vision tasks that mitigates the overfitting in learning. And It motivates us to design a new training scheme within our segmentation pipeline. Our method is evaluated on CVC-EndoSceneStill and Kvasir-SEG datasets. It achieves 4.34% and 5.70% Polyp-IoU improvements compared to the state-of-the-art methods on the EndoSceneStill and Kvasir-SEG datasets, respectively.
CVJul 7, 2025Code
Parameterized Diffusion Optimization enabled Autoregressive Ordinal Regression for Diabetic Retinopathy GradingQinkai Yu, Wei Zhou, Hantao Liu et al.
As a long-term complication of diabetes, diabetic retinopathy (DR) progresses slowly, potentially taking years to threaten vision. An accurate and robust evaluation of its severity is vital to ensure prompt management and care. Ordinal regression leverages the underlying inherent order between categories to achieve superior performance beyond traditional classification. However, there exist challenges leading to lower DR classification performance: 1) The uneven distribution of DR severity levels, characterized by a long-tailed pattern, adds complexity to the grading process. 2)The ambiguity in defining category boundaries introduces additional challenges, making the classification process more complex and prone to inconsistencies. This work proposes a novel autoregressive ordinal regression method called AOR-DR to address the above challenges by leveraging the clinical knowledge of inherent ordinal information in DR grading dataset settings. Specifically, we decompose the DR grading task into a series of ordered steps by fusing the prediction of the previous steps with extracted image features as conditions for the current prediction step. Additionally, we exploit the diffusion process to facilitate conditional probability modeling, enabling the direct use of continuous global image features for autoregression without relearning contextual information from patch-level features. This ensures the effectiveness of the autoregressive process and leverages the capabilities of pre-trained large-scale foundation models. Extensive experiments were conducted on four large-scale publicly available color fundus datasets, demonstrating our model's effectiveness and superior performance over six recent state-of-the-art ordinal regression methods. The implementation code is available at https://github.com/Qinkaiyu/AOR-DR.
IVFeb 6, 2024Code
Pushing the limits of cell segmentation models for imaging mass cytometryKimberley M. Bird, Xujiong Ye, Alan M. Race et al.
Imaging mass cytometry (IMC) is a relatively new technique for imaging biological tissue at subcellular resolution. In recent years, learning-based segmentation methods have enabled precise quantification of cell type and morphology, but typically rely on large datasets with fully annotated ground truth (GT) labels. This paper explores the effects of imperfect labels on learning-based segmentation models and evaluates the generalisability of these models to different tissue types. Our results show that removing 50% of cell annotations from GT masks only reduces the dice similarity coefficient (DSC) score to 0.874 (from 0.889 achieved by a model trained on fully annotated GT masks). This implies that annotation time can in fact be reduced by at least half without detrimentally affecting performance. Furthermore, training our single-tissue model on imperfect labels only decreases DSC by 0.031 on an unseen tissue type compared to its multi-tissue counterpart, with negligible qualitative differences in segmentation. Additionally, bootstrapping the worst-performing model (with 5% of cell annotations) a total of ten times improves its original DSC score of 0.720 to 0.829. These findings imply that less time and work can be put into the process of producing comparable segmentation models; this includes eliminating the need for multiple IMC tissue types during training, whilst also providing the potential for models with very few labels to improve on themselves. Source code is available on GitHub: https://github.com/kimberley/ISBI2024.
CVJul 23, 2025Code
CAPRI-CT: Causal Analysis and Predictive Reasoning for Image Quality Optimization in Computed TomographySneha George Gnanakalavathy, Hairil Abdul Razak, Robert Meertens et al.
In computed tomography (CT), achieving high image quality while minimizing radiation exposure remains a key clinical challenge. This paper presents CAPRI-CT, a novel causal-aware deep learning framework for Causal Analysis and Predictive Reasoning for Image Quality Optimization in CT imaging. CAPRI-CT integrates image data with acquisition metadata (such as tube voltage, tube current, and contrast agent types) to model the underlying causal relationships that influence image quality. An ensemble of Variational Autoencoders (VAEs) is employed to extract meaningful features and generate causal representations from observational data, including CT images and associated imaging parameters. These input features are fused to predict the Signal-to-Noise Ratio (SNR) and support counterfactual inference, enabling what-if simulations, such as changes in contrast agents (types and concentrations) or scan parameters. CAPRI-CT is trained and validated using an ensemble learning approach, achieving strong predictive performance. By facilitating both prediction and interpretability, CAPRI-CT provides actionable insights that could help radiologists and technicians design more efficient CT protocols without repeated physical scans. The source code and dataset are publicly available at https://github.com/SnehaGeorge22/capri-ct.
IVJun 5, 2025Code
Deep histological synthesis from mass spectrometry imaging for multimodal registrationKimberley M. Bird, Xujiong Ye, Alan M. Race et al.
Registration of histological and mass spectrometry imaging (MSI) allows for more precise identification of structural changes and chemical interactions in tissue. With histology and MSI having entirely different image formation processes and dimensionalities, registration of the two modalities remains an ongoing challenge. This work proposes a solution that synthesises histological images from MSI, using a pix2pix model, to effectively enable unimodal registration. Preliminary results show promising synthetic histology images with limited artifacts, achieving increases in mutual information (MI) and structural similarity index measures (SSIM) of +0.924 and +0.419, respectively, compared to a baseline U-Net model. Our source code is available on GitHub: https://github.com/kimberley/MIUA2025.
IVNov 4, 2020Code
DR-Unet104 for Multimodal MRI brain tumor segmentationJordan Colman, Lei Zhang, Wenting Duan et al.
In this paper we propose a 2D deep residual Unet with 104 convolutional layers (DR-Unet104) for lesion segmentation in brain MRIs. We make multiple additions to the Unet architecture, including adding the 'bottleneck' residual block to the Unet encoder and adding dropout after each convolution block stack. We verified the effect of introducing the regularisation of dropout with small rate (e.g. 0.2) on the architecture, and found a dropout of 0.2 improved the overall performance compared to no dropout, or a dropout of 0.5. We evaluated the proposed architecture as part of the Multimodal Brain Tumor Segmentation (BraTS) 2020 Challenge and compared our method to DeepLabV3+ with a ResNet-V2-152 backbone. We found that the DR-Unet104 achieved a mean dice score coefficient of 0.8862, 0.6756 and 0.6721 for validation data, whole tumor, enhancing tumor and tumor core respectively, an overall improvement on 0.8770, 0.65242 and 0.68134 achieved by DeepLabV3+. Our method produced a final mean DSC of 0.8673, 0.7514 and 0.7983 on whole tumor, enhancing tumor and tumor core on the challenge's testing data. We produced a competitive lesion segmentation architecture, despite only 2D convolutions, having the added benefit that it can be used on lower power computers than a 3D architecture. The source code and trained model for this work is openly available at https://github.com/jordan-colman/DR-Unet104.
CVJul 13, 2025
Memory-Augmented SAM2 for Training-Free Surgical Video SegmentationMing Yin, Fu Wang, Xujiong Ye et al.
Surgical video segmentation is a critical task in computer-assisted surgery, essential for enhancing surgical quality and patient outcomes. Recently, the Segment Anything Model 2 (SAM2) framework has demonstrated remarkable advancements in both image and video segmentation. However, the inherent limitations of SAM2's greedy selection memory design are amplified by the unique properties of surgical videos-rapid instrument movement, frequent occlusion, and complex instrument-tissue interaction-resulting in diminished performance in the segmentation of complex, long videos. To address these challenges, we introduce Memory Augmented (MA)-SAM2, a training-free video object segmentation strategy, featuring novel context-aware and occlusion-resilient memory models. MA-SAM2 exhibits strong robustness against occlusions and interactions arising from complex instrument movements while maintaining accuracy in segmenting objects throughout videos. Employing a multi-target, single-loop, one-prompt inference further enhances the efficiency of the tracking process in multi-instrument videos. Without introducing any additional parameters or requiring further training, MA-SAM2 achieved performance improvements of 4.36% and 6.1% over SAM2 on the EndoVis2017 and EndoVis2018 datasets, respectively, demonstrating its potential for practical surgical applications.
CVMar 18, 2025
Validation of Human Pose Estimation and Human Mesh Recovery for Extracting Clinically Relevant Motion Data from VideosKai Armstrong, Alexander Rodrigues, Alexander P. Willmott et al.
This work aims to discuss the current landscape of kinematic analysis tools, ranging from the state-of-the-art in sports biomechanics such as inertial measurement units (IMUs) and retroreflective marker-based optical motion capture (MoCap) to more novel approaches from the field of computing such as human pose estimation and human mesh recovery. Primarily, this comparative analysis aims to validate the use of marker-less MoCap techniques in a clinical setting by showing that these marker-less techniques are within a reasonable range for kinematics analysis compared to the more cumbersome and less portable state-of-the-art tools. Not only does marker-less motion capture using human pose estimation produce results in-line with the results of both the IMU and MoCap kinematics but also benefits from a reduced set-up time and reduced practical knowledge and expertise to set up. Overall, while there is still room for improvement when it comes to the quality of the data produced, we believe that this compromise is within the room of error that these low-speed actions that are used in small clinical tests.
IVFeb 5, 2025
MetaFE-DE: Learning Meta Feature Embedding for Depth Estimation from Monocular Endoscopic ImagesDawei Lu, Deqiang Xiao, Danni Ai et al.
Depth estimation from monocular endoscopic images presents significant challenges due to the complexity of endoscopic surgery, such as irregular shapes of human soft tissues, as well as variations in lighting conditions. Existing methods primarily estimate the depth information from RGB images directly, and often surffer the limited interpretability and accuracy. Given that RGB and depth images are two views of the same endoscopic surgery scene, in this paper, we introduce a novel concept referred as ``meta feature embedding (MetaFE)", in which the physical entities (e.g., tissues and surgical instruments) of endoscopic surgery are represented using the shared features that can be alternatively decoded into RGB or depth image. With this concept, we propose a two-stage self-supervised learning paradigm for the monocular endoscopic depth estimation. In the first stage, we propose a temporal representation learner using diffusion models, which are aligned with the spatial information through the cross normalization to construct the MetaFE. In the second stage, self-supervised monocular depth estimation with the brightness calibration is applied to decode the meta features into the depth image. Extensive evaluation on diverse endoscopic datasets demonstrates that our approach outperforms the state-of-the-art method in depth estimation, achieving superior accuracy and generalization. The source code will be publicly available.
CVMay 28, 2021
Unsupervised detection of mouse behavioural anomalies using two-stream convolutional autoencodersEzechukwu I Nwokedi, Rasneer S Bains, Luc Bidaut et al.
This paper explores the application of unsupervised learning to detecting anomalies in mouse video data. The two models presented in this paper are a dual-stream, 3D convolutional autoencoder (with residual connections) and a dual-stream, 2D convolutional autoencoder. The publicly available dataset used here contains twelve videos of single home-caged mice alongside frame-level annotations. Under the pretext that the autoencoder only sees normal events, the video data was handcrafted to treat each behaviour as a pseudo-anomaly thereby eliminating them from the others during training. The results are presented for one conspicuous behaviour (hang) and one inconspicuous behaviour (groom). The performance of these models is compared to a single stream autoencoder and a supervised learning model, which are both based on the custom CAE. Both models are also tested on the CUHK Avenue dataset were found to perform as well as some state-of-the-art architectures.
CVSep 13, 2019
MRI Brain Tumor Segmentation using Random Forests and Fully Convolutional NetworksMohammadreza Soltaninejad, Lei Zhang, Tryphon Lambrou et al.
In this paper, we propose a novel learning based method for automated segmentation of brain tumor in multimodal MRI images, which incorporates two sets of machine -learned and hand crafted features. Fully convolutional networks (FCN) forms the machine learned features and texton based features are considered as hand-crafted features. Random forest (RF) is used to classify the MRI image voxels into normal brain tissues and different parts of tumors, i.e. edema, necrosis and enhancing tumor. The method was evaluated on BRATS 2017 challenge dataset. The results show that the proposed method provides promising segmentations. The mean Dice overlap measure for automatic brain tumor segmentation against ground truth is 0.86, 0.78 and 0.66 for whole tumor, core and enhancing tumor, respectively.
CYMay 26, 2019
A hybrid model for predicting human physical activity status from lifelogging dataJi Ni, Bowei Chen, Nigel M. Allinson et al.
One trend in the recent healthcare transformations is people are encouraged to monitor and manage their health based on their daily diets and physical activity habits. However, much attention of the use of operational research and analytical models in healthcare has been paid to the systematic level such as country or regional policy making or organisational issues. This paper proposes a model concerned with healthcare analytics at the individual level, which can predict human physical activity status from sequential lifelogging data collected from wearable sensors. The model has a two-stage hybrid structure (in short, MOGP-HMM) -- a multi-objective genetic programming (MOGP) algorithm in the first stage to reduce the dimensions of lifelogging data and a hidden Markov model (HMM) in the second stage for activity status prediction over time. It can be used as a decision support tool to provide real-time monitoring, statistical analysis and personalized advice to individuals, encouraging positive attitudes towards healthy lifestyles. We validate the model with the real data collected from a group of participants in the UK, and compare it with other popular two-stage hybrid models. Our experimental results show that the MOGP-HMM can achieve comparable performance. To the best of our knowledge, this is the very first study that uses the MOGP in the hybrid two-stage structure for individuals' activity status prediction. It fits seamlessly with the current trend in the UK healthcare transformation of patient empowerment as well as contributing to a strategic development for more efficient and cost-effective provision of healthcare.
CVDec 12, 2018
Automatic individual pig detection and tracking in surveillance videosLei Zhang, Helen Gray, Xujiong Ye et al.
Individual pig detection and tracking is an important requirement in many video-based pig monitoring applications. However, it still remains a challenging task in complex scenes, due to problems of light fluctuation, similar appearances of pigs, shape deformations and occlusions. To tackle these problems, we propose a robust real time multiple pig detection and tracking method which does not require manual marking or physical identification of the pigs, and works under both daylight and infrared light conditions. Our method couples a CNN-based detector and a correlation filter-based tracker via a novel hierarchical data association algorithm. The detector gains the best accuracy/speed trade-off by using the features derived from multiple layers at different scales in a one-stage prediction network. We define a tag-box for each pig as the tracking target, and the multiple object tracking is conducted in a key-points tracking manner using learned correlation filters. Under challenging conditions, the tracking failures are modelled based on the relations between responses of detector and tracker, and the data association algorithm allows the detection hypotheses to be refined, meanwhile the drifted tracks can be corrected by probing the tracking failures followed by the re-initialization of tracking. As a result, the optimal tracklets can sequentially grow with on-line refined detections, and tracking fragments are correctly integrated into respective tracks while keeping the original identifications. Experiments with a dataset captured from a commercial farm show that our method can robustly detect and track multiple pigs under challenging conditions. The promising performance of the proposed method also demonstrates a feasibility of long-term individual pig tracking in a complex environment and thus promises a commercial potential.
CVNov 5, 2018
Identifying the Best Machine Learning Algorithms for Brain Tumor Segmentation, Progression Assessment, and Overall Survival Prediction in the BRATS ChallengeSpyridon Bakas, Mauricio Reyes, Andras Jakab et al.
Gliomas are the most common primary brain malignancies, with different degrees of aggressiveness, variable prognosis and various heterogeneous histologic sub-regions, i.e., peritumoral edematous/invaded tissue, necrotic core, active and non-enhancing core. This intrinsic heterogeneity is also portrayed in their radio-phenotype, as their sub-regions are depicted by varying intensity profiles disseminated across multi-parametric magnetic resonance imaging (mpMRI) scans, reflecting varying biological properties. Their heterogeneous shape, extent, and location are some of the factors that make these tumors difficult to resect, and in some cases inoperable. The amount of resected tumor is a factor also considered in longitudinal scans, when evaluating the apparent tumor for potential diagnosis of progression. Furthermore, there is mounting evidence that accurate segmentation of the various tumor sub-regions can offer the basis for quantitative image analysis towards prediction of patient overall survival. This study assesses the state-of-the-art machine learning (ML) methods used for brain tumor image analysis in mpMRI scans, during the last seven instances of the International Brain Tumor Segmentation (BraTS) challenge, i.e., 2012-2018. Specifically, we focus on i) evaluating segmentations of the various glioma sub-regions in pre-operative mpMRI scans, ii) assessing potential tumor progression by virtue of longitudinal growth of tumor sub-regions, beyond use of the RECIST/RANO criteria, and iii) predicting the overall survival from pre-operative mpMRI scans of patients that underwent gross total resection. Finally, we investigate the challenge of identifying the best ML algorithms for each of these tasks, considering that apart from being diverse on each instance of the challenge, the multi-institutional mpMRI BraTS dataset has also been a continuously evolving/growing dataset.
CVMay 26, 2017
Fully Automatic Segmentation and Objective Assessment of Atrial Scars for Longstanding Persistent Atrial Fibrillation Patients Using Late Gadolinium-Enhanced MRIGuang Yang, Xiahai Zhuang, Habib Khan et al.
Purpose: Atrial fibrillation (AF) is the most common cardiac arrhythmia and is correlated with increased morbidity and mortality. It is associated with atrial fibrosis, which may be assessed non-invasively using late gadolinium-enhanced (LGE) magnetic resonance imaging (MRI) where scar tissue is visualised as a region of signal enhancement. In this study, we proposed a novel fully automatic pipeline to achieve an accurate and objective atrial scarring segmentation and assessment of LGE MRI scans for the AF patients. Methods: Our fully automatic pipeline uniquely combined: (1) a multi-atlas based whole heart segmentation (MA-WHS) to determine the cardiac anatomy from an MRI Roadmap acquisition which is then mapped to LGE MRI, and (2) a super-pixel and supervised learning based approach to delineate the distribution and extent of atrial scarring in LGE MRI. Results: Both our MA-WHS and atrial scarring segmentation showed accurate delineations of cardiac anatomy (mean Dice = 89%) and atrial scarring (mean Dice =79%) respectively compared to the established ground truth from manual segmentation. Compared with previously studied methods with manual interventions, our innovative pipeline demonstrated comparable results, but was computed fully automatically. Conclusion: The proposed segmentation methods allow LGE MRI to be used as an objective assessment tool for localisation, visualisation and quantification of atrial scarring.
CVMay 19, 2017
Deep De-Aliasing for Fast Compressive Sensing MRISimiao Yu, Hao Dong, Guang Yang et al.
Fast Magnetic Resonance Imaging (MRI) is highly in demand for many clinical applications in order to reduce the scanning cost and improve the patient experience. This can also potentially increase the image quality by reducing the motion artefacts and contrast washout. However, once an image field of view and the desired resolution are chosen, the minimum scanning time is normally determined by the requirement of acquiring sufficient raw data to meet the Nyquist-Shannon sampling criteria. Compressive Sensing (CS) theory has been perfectly matched to the MRI scanning sequence design with much less required raw data for the image reconstruction. Inspired by recent advances in deep learning for solving various inverse problems, we propose a conditional Generative Adversarial Networks-based deep learning framework for de-aliasing and reconstructing MRI images from highly undersampled data with great promise to accelerate the data acquisition process. By coupling an innovative content loss with the adversarial loss our de-aliasing results are more realistic. Furthermore, we propose a refinement learning procedure for training the generator network, which can stabilise the training with fast convergence and less parameter tuning. We demonstrate that the proposed framework outperforms state-of-the-art CS-MRI methods, in terms of reconstruction error and perceptual image quality. In addition, our method can reconstruct each image in 0.22ms--0.37ms, which is promising for real-time applications.
CVApr 26, 2017
Multimodal MRI brain tumor segmentation using random forests with features learned from fully convolutional neural networkMohammadreza Soltaninejad, Lei Zhang, Tryphon Lambrou et al.
In this paper, we propose a novel learning based method for automated segmenta-tion of brain tumor in multimodal MRI images. The machine learned features from fully convolutional neural network (FCN) and hand-designed texton fea-tures are used to classify the MRI image voxels. The score map with pixel-wise predictions is used as a feature map which is learned from multimodal MRI train-ing dataset using the FCN. The learned features are then applied to random for-ests to classify each MRI image voxel into normal brain tissues and different parts of tumor. The method was evaluated on BRATS 2013 challenge dataset. The results show that the application of the random forest classifier to multimodal MRI images using machine-learned features based on FCN and hand-designed features based on textons provides promising segmentations. The Dice overlap measure for automatic brain tumor segmentation against ground truth is 0.88, 080 and 0.73 for complete tumor, core and enhancing tumor, respectively.