Ibrahim Almakky

CV
h-index21
23papers
101citations
Novelty47%
AI Score45

23 Papers

CVAug 27, 2023Code
PECon: Contrastive Pretraining to Enhance Feature Alignment between CT and EHR Data for Improved Pulmonary Embolism Diagnosis

Santosh Sanjeev, Salwa K. Al Khatib, Mai A. Shaaban et al.

Previous deep learning efforts have focused on improving the performance of Pulmonary Embolism(PE) diagnosis from Computed Tomography (CT) scans using Convolutional Neural Networks (CNN). However, the features from CT scans alone are not always sufficient for the diagnosis of PE. CT scans along with electronic heath records (EHR) can provide a better insight into the patients condition and can lead to more accurate PE diagnosis. In this paper, we propose Pulmonary Embolism Detection using Contrastive Learning (PECon), a supervised contrastive pretraining strategy that employs both the patients CT scans as well as the EHR data, aiming to enhance the alignment of feature representations between the two modalities and leverage information to improve the PE diagnosis. In order to achieve this, we make use of the class labels and pull the sample features of the same class together, while pushing away those of the other class. Results show that the proposed work outperforms the existing techniques and achieves state-of-the-art performance on the RadFusion dataset with an F1-score of 0.913, accuracy of 0.90 and an AUROC of 0.943. Furthermore, we also explore the explainability of our approach in comparison to other methods. Our code is publicly available at https://github.com/BioMedIA-MBZUAI/PECon.

CVAug 20, 2023Code
FedSIS: Federated Split Learning with Intermediate Representation Sampling for Privacy-preserving Generalized Face Presentation Attack Detection

Naif Alkhunaizi, Koushik Srivatsan, Faris Almalik et al.

Lack of generalization to unseen domains/attacks is the Achilles heel of most face presentation attack detection (FacePAD) algorithms. Existing attempts to enhance the generalizability of FacePAD solutions assume that data from multiple source domains are available with a single entity to enable centralized training. In practice, data from different source domains may be collected by diverse entities, who are often unable to share their data due to legal and privacy constraints. While collaborative learning paradigms such as federated learning (FL) can overcome this problem, standard FL methods are ill-suited for domain generalization because they struggle to surmount the twin challenges of handling non-iid client data distributions during training and generalizing to unseen domains during inference. In this work, a novel framework called Federated Split learning with Intermediate representation Sampling (FedSIS) is introduced for privacy-preserving domain generalization. In FedSIS, a hybrid Vision Transformer (ViT) architecture is learned using a combination of FL and split learning to achieve robustness against statistical heterogeneity in the client data distributions without any sharing of raw data (thereby preserving privacy). To further improve generalization to unseen domains, a novel feature augmentation strategy called intermediate representation sampling is employed, and discriminative information from intermediate blocks of a ViT is distilled using a shared adapter network. The FedSIS approach has been evaluated on two well-known benchmarks for cross-domain FacePAD to demonstrate that it is possible to achieve state-of-the-art generalization performance without data sharing. Code: https://github.com/Naiftt/FedSIS

CVJun 26, 2023Code
FeSViBS: Federated Split Learning of Vision Transformer with Block Sampling

Faris Almalik, Naif Alkhunaizi, Ibrahim Almakky et al.

Data scarcity is a significant obstacle hindering the learning of powerful machine learning models in critical healthcare applications. Data-sharing mechanisms among multiple entities (e.g., hospitals) can accelerate model training and yield more accurate predictions. Recently, approaches such as Federated Learning (FL) and Split Learning (SL) have facilitated collaboration without the need to exchange private data. In this work, we propose a framework for medical imaging classification tasks called Federated Split learning of Vision transformer with Block Sampling (FeSViBS). The FeSViBS framework builds upon the existing federated split vision transformer and introduces a block sampling module, which leverages intermediate features extracted by the Vision Transformer (ViT) at the server. This is achieved by sampling features (patch tokens) from an intermediate transformer block and distilling their information content into a pseudo class token before passing them back to the client. These pseudo class tokens serve as an effective feature augmentation strategy and enhances the generalizability of the learned model. We demonstrate the utility of our proposed method compared to other SL and FL approaches on three publicly available medical imaging datasets: HAM1000, BloodMNIST, and Fed-ISIC2019, under both IID and non-IID settings. Code: https://github.com/faresmalik/FeSViBS

SDJun 7, 2023
Arabic Dysarthric Speech Recognition Using Adversarial and Signal-Based Augmentation

Massa Baali, Ibrahim Almakky, Shady Shehata et al.

Despite major advancements in Automatic Speech Recognition (ASR), the state-of-the-art ASR systems struggle to deal with impaired speech even with high-resource languages. In Arabic, this challenge gets amplified, with added complexities in collecting data from dysarthric speakers. In this paper, we aim to improve the performance of Arabic dysarthric automatic speech recognition through a multi-stage augmentation approach. To this effect, we first propose a signal-based approach to generate dysarthric Arabic speech from healthy Arabic speech by modifying its speed and tempo. We also propose a second stage Parallel Wave Generative (PWG) adversarial model that is trained on an English dysarthric dataset to capture language-independant dysarthric speech patterns and further augment the signal-adjusted speech samples. Furthermore, we propose a fine-tuning and text-correction strategies for Arabic Conformer at different dysarthric speech severity levels. Our fine-tuned Conformer achieved 18% Word Error Rate (WER) and 17.2% Character Error Rate (CER) on synthetically generated dysarthric speech from the Arabic commonvoice speech dataset. This shows significant WER improvement of 81.8% compared to the baseline model trained solely on healthy data. We perform further validation on real English dysarthric speech showing a WER improvement of 124% compared to the baseline trained only on healthy English LJSpeech dataset.

IVNov 16, 2023Code
Multi-Task Learning Approach for Unified Biometric Estimation from Fetal Ultrasound Anomaly Scans

Mohammad Areeb Qazi, Mohammed Talha Alam, Ibrahim Almakky et al.

Precise estimation of fetal biometry parameters from ultrasound images is vital for evaluating fetal growth, monitoring health, and identifying potential complications reliably. However, the automated computerized segmentation of the fetal head, abdomen, and femur from ultrasound images, along with the subsequent measurement of fetal biometrics, remains challenging. In this work, we propose a multi-task learning approach to classify the region into head, abdomen and femur as well as estimate the associated parameters. We were able to achieve a mean absolute error (MAE) of 1.08 mm on head circumference, 1.44 mm on abdomen circumference and 1.10 mm on femur length with a classification accuracy of 99.91\% on a dataset of fetal Ultrasound images. To achieve this, we leverage a weighted joint classification and segmentation loss function to train a U-Net architecture with an added classification head. The code can be accessed through \href{https://github.com/BioMedIA-MBZUAI/Multi-Task-Learning-Approach-for-Unified-Biometric-Estimation-from-Fetal-Ultrasound-Anomaly-Scans.git}{\texttt{Github}

CVNov 14, 2025Code
MAFM^3: Modular Adaptation of Foundation Models for Multi-Modal Medical AI

Mohammad Areeb Qazi, Munachiso S Nwadike, Ibrahim Almakky et al.

Foundational models are trained on extensive datasets to capture the general trends of a domain. However, in medical imaging, the scarcity of data makes pre-training for every domain, modality, or task challenging. Instead of building separate models, we propose MAFM^3 (Modular Adaptation of Foundation Models for Multi-Modal Medical AI), a framework that enables a single foundation model to expand into diverse domains, tasks, and modalities through lightweight modular components. These components serve as specialized skill sets that allow the system to flexibly activate the appropriate capability at the inference time, depending on the input type or clinical objective. Unlike conventional adaptation methods that treat each new task or modality in isolation, MAFM^3 provides a unified and expandable framework for efficient multitask and multimodality adaptation. Empirically, we validate our approach by adapting a chest CT foundation model initially trained for classification into prognosis and segmentation modules. Our results show improved performance on both tasks. Furthermore, by incorporating PET scans, MAFM^3 achieved an improvement in the Dice score 5% compared to the respective baselines. These findings establish that foundation models, when equipped with modular components, are not inherently constrained to their initial training scope but can evolve into multitask, multimodality systems for medical imaging. The code implementation of this work can be found at https://github.com/Areeb2735/CTscan_prognosis_VLM

CVAug 15, 2023
SEDA: Self-Ensembling ViT with Defensive Distillation and Adversarial Training for robust Chest X-rays Classification

Raza Imam, Ibrahim Almakky, Salma Alrashdi et al.

Deep Learning methods have recently seen increased adoption in medical imaging applications. However, elevated vulnerabilities have been explored in recent Deep Learning solutions, which can hinder future adoption. Particularly, the vulnerability of Vision Transformer (ViT) to adversarial, privacy, and confidentiality attacks raise serious concerns about their reliability in medical settings. This work aims to enhance the robustness of self-ensembling ViTs for the tuberculosis chest x-ray classification task. We propose Self-Ensembling ViT with defensive Distillation and Adversarial training (SEDA). SEDA utilizes efficient CNN blocks to learn spatial features with various levels of abstraction from feature representations extracted from intermediate ViT blocks, that are largely unaffected by adversarial perturbations. Furthermore, SEDA leverages adversarial training in combination with defensive distillation for improved robustness against adversaries. Training using adversarial examples leads to better model generalizability and improves its ability to handle perturbations. Distillation using soft probabilities introduces uncertainty and variation into the output probabilities, making it more difficult for adversarial and privacy attacks. Extensive experiments performed with the proposed architecture and training paradigm on publicly available Tuberculosis x-ray dataset shows SOTA efficacy of SEDA compared to SEViT in terms of computational efficiency with 70x times lighter framework and enhanced robustness of +9%.

IVMar 8, 2023
FUSQA: Fetal Ultrasound Segmentation Quality Assessment

Sevim Cengiz, Ibrahim Almakky, Mohammad Yaqub

Deep learning models have been effective for various fetal ultrasound segmentation tasks. However, generalization to new unseen data has raised questions about their effectiveness for clinical adoption. Normally, a transition to new unseen data requires time-consuming and costly quality assurance processes to validate the segmentation performance post-transition. Segmentation quality assessment efforts have focused on natural images, where the problem has been typically formulated as a dice score regression task. In this paper, we propose a simplified Fetal Ultrasound Segmentation Quality Assessment (FUSQA) model to tackle the segmentation quality assessment when no masks exist to compare with. We formulate the segmentation quality assessment process as an automated classification task to distinguish between good and poor-quality segmentation masks for more accurate gestational age estimation. We validate the performance of our proposed approach on two datasets we collect from two hospitals using different ultrasound machines. We compare different architectures, with our best-performing architecture achieving over 90% classification accuracy on distinguishing between good and poor-quality segmentation masks from an unseen dataset. Additionally, there was only a 1.45-day difference between the gestational age reported by doctors and estimated based on CRL measurements using well-segmented masks. On the other hand, this difference increased and reached up to 7.73 days when we calculated CRL from the poorly segmented masks. As a result, AI-based approaches can potentially aid fetal ultrasound segmentation quality assessment and might detect poor segmentation in real-time screening in the future.

CVMay 22, 2024Code
Continual Learning in Medical Imaging: A Survey and Practical Analysis

Mohammad Areeb Qazi, Anees Ur Rehman Hashmi, Santosh Sanjeev et al.

Deep Learning has shown great success in reshaping medical imaging, yet it faces numerous challenges hindering widespread application. Issues like catastrophic forgetting and distribution shifts in the continuously evolving data stream increase the gap between research and applications. Continual Learning offers promise in addressing these hurdles by enabling the sequential acquisition of new knowledge without forgetting previous learnings in neural networks. In this survey, we comprehensively review the recent literature on continual learning in the medical domain, highlight recent trends, and point out the practical issues. Specifically, we survey the continual learning studies on classification, segmentation, detection, and other tasks in the medical domain. Furthermore, we develop a taxonomy for the reviewed studies, identify the challenges, and provide insights to overcome them. We also critically discuss the current state of continual learning in medical imaging, including identifying open problems and outlining promising future directions. We hope this survey will provide researchers with a useful overview of the developments in the field and will further increase interest in the community. To keep up with the fast-paced advancements in this field, we plan to routinely update the repository with the latest relevant papers at https://github.com/BioMedIA-MBZUAI/awesome-cl-in-medical .

CVApr 22, 2024Code
DynaMMo: Dynamic Model Merging for Efficient Class Incremental Learning for Medical Images

Mohammad Areeb Qazi, Ibrahim Almakky, Anees Ur Rehman Hashmi et al.

Continual learning, the ability to acquire knowledge from new data while retaining previously learned information, is a fundamental challenge in machine learning. Various approaches, including memory replay, knowledge distillation, model regularization, and dynamic network expansion, have been proposed to address this issue. Thus far, dynamic network expansion methods have achieved state-of-the-art performance at the cost of incurring significant computational overhead. This is due to the need for additional model buffers, which makes it less feasible in resource-constrained settings, particularly in the medical domain. To overcome this challenge, we propose Dynamic Model Merging, DynaMMo, a method that merges multiple networks at different stages of model training to achieve better computational efficiency. Specifically, we employ lightweight learnable modules for each task and combine them into a unified model to minimize computational overhead. DynaMMo achieves this without compromising performance, offering a cost-effective solution for continual learning in medical applications. We evaluate DynaMMo on three publicly available datasets, demonstrating its effectiveness compared to existing approaches. DynaMMo offers around 10-fold reduction in GFLOPS with a small drop of 2.76 in average accuracy when compared to state-of-the-art dynamic-based approaches. The code implementation of this work will be available upon the acceptance of this work at https://github.com/BioMedIA-MBZUAI/DynaMMo.

CVMar 18, 2024Code
MedMerge: Merging Models for Effective Transfer Learning to Medical Imaging Tasks

Ibrahim Almakky, Santosh Sanjeev, Anees Ur Rehman Hashmi et al.

Transfer learning has become a powerful tool to initialize deep learning models to achieve faster convergence and higher performance. This is especially useful in the medical imaging analysis domain, where data scarcity limits possible performance gains for deep learning models. Some advancements have been made in boosting the transfer learning performance gain by merging models starting from the same initialization. However, in the medical imaging analysis domain, there is an opportunity to merge models starting from different initializations, thus combining the features learned from different tasks. In this work, we propose MedMerge, a method whereby the weights of different models can be merged, and their features can be effectively utilized to boost performance on a new task. With MedMerge, we learn kernel-level weights that can later be used to merge the models into a single model, even when starting from different initializations. Testing on various medical imaging analysis tasks, we show that our merged model can achieve significant performance gains, with up to 7% improvement on the F1 score. The code implementation of this work is available at github.com/BioMedIA-MBZUAI/MedMerge.

CVMar 20, 2024Code
FissionFusion: Fast Geometric Generation and Hierarchical Souping for Medical Image Analysis

Santosh Sanjeev, Nuren Zhaksylyk, Ibrahim Almakky et al.

The scarcity of well-annotated medical datasets requires leveraging transfer learning from broader datasets like ImageNet or pre-trained models like CLIP. Model soups averages multiple fine-tuned models aiming to improve performance on In-Domain (ID) tasks and enhance robustness against Out-of-Distribution (OOD) datasets. However, applying these methods to the medical imaging domain faces challenges and results in suboptimal performance. This is primarily due to differences in error surface characteristics that stem from data complexities such as heterogeneity, domain shift, class imbalance, and distributional shifts between training and testing phases. To address this issue, we propose a hierarchical merging approach that involves local and global aggregation of models at various levels based on models' hyperparameter configurations. Furthermore, to alleviate the need for training a large number of models in the hyperparameter search, we introduce a computationally efficient method using a cyclical learning rate scheduler to produce multiple models for aggregation in the weight space. Our method demonstrates significant improvements over the model souping approach across multiple datasets (around 6% gain in HAM10000 and CheXpert datasets) while maintaining low computational costs for model generation and selection. Moreover, we achieve better results on OOD datasets than model soups. The code is available at https://github.com/BioMedIA-MBZUAI/FissionFusion.

LGNov 14, 2024Code
Rethinking Weight-Averaged Model-merging

Hu Wang, Congbo Ma, Ibrahim Almakky et al.

Model merging, particularly through weight averaging, has shown surprising effectiveness in saving computations and improving model performance without any additional training. However, the interpretability of why and how this technique works remains unclear. In this work, we reinterpret weight-averaged model merging through the lens of interpretability and provide empirical insights into the underlying mechanisms that govern its behavior. We approach the problem from three perspectives: (1) we analyze the learned weight structures and demonstrate that model weights encode structured representations that help explain the compatibility of weight averaging; (2) we compare averaging in weight space and feature space across diverse model architectures (CNNs and ViTs) and datasets, aiming to expose under which circumstances what combination paradigm will work more effectively; (3) we study the effect of parameter scaling on prediction stability, highlighting how weight averaging acts as a form of regularization that contributes to robustness. By framing these analyses in an interpretability context, our work contributes to a more transparent and systematic understanding of model merging for stakeholders interested in the safety and reliability of untrained model combination methods. The code is available at https://github.com/billhhh/Rethink-Merge.

TOMar 25, 2025Code
RP-SAM2: Refining Point Prompts for Stable Surgical Instrument Segmentation

Nuren Zhaksylyk, Ibrahim Almakky, Jay Paranjape et al.

Accurate surgical instrument segmentation is essential in cataract surgery for tasks such as skill assessment and workflow optimization. However, limited annotated data makes it difficult to develop fully automatic models. Prompt-based methods like SAM2 offer flexibility yet remain highly sensitive to the point prompt placement, often leading to inconsistent segmentations. We address this issue by introducing RP-SAM2, which incorporates a novel shift block and a compound loss function to stabilize point prompts. Our approach reduces annotator reliance on precise point positioning while maintaining robust segmentation capabilities. Experiments on the Cataract1k dataset demonstrate that RP-SAM2 improves segmentation accuracy, with a 2% mDSC gain, a 21.36% reduction in mHD95, and decreased variance across random single-point prompt results compared to SAM2. Additionally, on the CaDIS dataset, pseudo masks generated by RP-SAM2 for fine-tuning SAM2's mask decoder outperformed those generated by SAM2. These results highlight RP-SAM2 as a practical, stable and reliable solution for semi-automatic instrument segmentation in data-constrained medical settings. The code is available at https://github.com/BioMedIA-MBZUAI/RP-SAM2.

IVMar 20, 2025Code
SALT: Parameter-Efficient Fine-Tuning via Singular Value Adaptation with Low-Rank Transformation

Abdelrahman Elsayed, Sarim Hashmi, Mohammed Elseiagy et al.

The complex nature of medical image segmentation calls for models that are specifically designed to capture detailed, domain-specific features. Large foundation models offer considerable flexibility, yet the cost of fine-tuning these models remains a significant barrier. Parameter-Efficient Fine-Tuning (PEFT) methods, such as Low-Rank Adaptation (LoRA), efficiently update model weights with low-rank matrices but may suffer from underfitting when the chosen rank is insufficient to capture domain-specific nuances. Conversely, full-rank Singular Value Decomposition (SVD) based methods provide comprehensive updates by modifying all singular values, yet they often lack flexibility and exhibit variable performance across datasets. We propose SALT (Singular Value Adaptation with Low-Rank Transformation), a method that selectively adapts the most influential singular values using trainable scale and shift parameters while complementing this with a low-rank update for the remaining subspace. This hybrid approach harnesses the advantages of both LoRA and SVD, enabling effective adaptation without relying on increasing model size or depth. Evaluated on 5 challenging medical datasets, ranging from as few as 20 samples to 1000, SALT outperforms state-of-the-art PEFT (LoRA and SVD) by 2% to 5% in Dice with only 3.9% trainable parameters, demonstrating robust adaptation even in low-resource settings. The code for SALT is available at: https://github.com/BioMedIA-MBZUAI/SALT

CVMar 20, 2024Code
TiBiX: Leveraging Temporal Information for Bidirectional X-ray and Report Generation

Santosh Sanjeev, Fadillah Adamsyah Maani, Arsen Abzhanov et al.

With the emergence of vision language models in the medical imaging domain, numerous studies have focused on two dominant research activities: (1) report generation from Chest X-rays (CXR), and (2) synthetic scan generation from text or reports. Despite some research incorporating multi-view CXRs into the generative process, prior patient scans and reports have been generally disregarded. This can inadvertently lead to the leaving out of important medical information, thus affecting generation quality. To address this, we propose TiBiX: Leveraging Temporal information for Bidirectional X-ray and Report Generation. Considering previous scans, our approach facilitates bidirectional generation, primarily addressing two challenging problems: (1) generating the current image from the previous image and current report and (2) generating the current report based on both the previous and current images. Moreover, we extract and release a curated temporal benchmark dataset derived from the MIMIC-CXR dataset, which focuses on temporal data. Our comprehensive experiments and ablation studies explore the merits of incorporating prior CXRs and achieve state-of-the-art (SOTA) results on the report generation task. Furthermore, we attain on-par performance with SOTA image generation efforts, thus serving as a new baseline in longitudinal bidirectional CXR-to-report generation. The code is available at https://github.com/BioMedIA-MBZUAI/TiBiX.

CVApr 22, 2025Code
MedNNS: Supernet-based Medical Task-Adaptive Neural Network Search

Lotfi Abdelkrim Mecharbat, Ibrahim Almakky, Martin Takac et al.

Deep learning (DL) has achieved remarkable progress in the field of medical imaging. However, adapting DL models to medical tasks remains a significant challenge, primarily due to two key factors: (1) architecture selection, as different tasks necessitate specialized model designs, and (2) weight initialization, which directly impacts the convergence speed and final performance of the models. Although transfer learning from ImageNet is a widely adopted strategy, its effectiveness is constrained by the substantial differences between natural and medical images. To address these challenges, we introduce Medical Neural Network Search (MedNNS), the first Neural Network Search framework for medical imaging applications. MedNNS jointly optimizes architecture selection and weight initialization by constructing a meta-space that encodes datasets and models based on how well they perform together. We build this space using a Supernetwork-based approach, expanding the model zoo size by 51x times over previous state-of-the-art (SOTA) methods. Moreover, we introduce rank loss and Fréchet Inception Distance (FID) loss into the construction of the space to capture inter-model and inter-dataset relationships, thereby achieving more accurate alignment in the meta-space. Experimental results across multiple datasets demonstrate that MedNNS significantly outperforms both ImageNet pre-trained DL models and SOTA Neural Architecture Search (NAS) methods, achieving an average accuracy improvement of 1.7% across datasets while converging substantially faster. The code and the processed meta-space is available at https://github.com/BioMedIA-MBZUAI/MedNNS.

IVMar 14, 2024Code
XReal: Realistic Anatomy and Pathology-Aware X-ray Generation via Controllable Diffusion Model

Anees Ur Rehman Hashmi, Ibrahim Almakky, Mohammad Areeb Qazi et al.

Large-scale generative models have demonstrated impressive capabilities in producing visually compelling images, with increasing applications in medical imaging. However, they continue to grapple with hallucination challenges and the generation of anatomically inaccurate outputs. These limitations are mainly due to the reliance on textual inputs and lack of spatial control over the generated images, hindering the potential usefulness of such models in real-life settings. In this work, we present XReal, a novel controllable diffusion model for generating realistic chest X-ray images through precise anatomy and pathology location control. Our lightweight method comprises an Anatomy Controller and a Pathology Controller to introduce spatial control over anatomy and pathology in a pre-trained Text-to-Image Diffusion Model, respectively, without fine-tuning the model. XReal outperforms state-of-the-art X-ray diffusion models in quantitative metrics and radiologists' ratings, showing significant gains in anatomy and pathology realism. Our model holds promise for advancing generative models in medical imaging, offering greater precision and adaptability while inviting further exploration in this evolving field. The code and pre-trained model weights are publicly available at https://github.com/BioMedIA-MBZUAI/XReal.

CVFeb 27, 2025
In-Model Merging for Enhancing the Robustness of Medical Imaging Classification Models

Hu Wang, Ibrahim Almakky, Congbo Ma et al.

Model merging is an effective strategy to merge multiple models for enhancing model performances, and more efficient than ensemble learning as it will not introduce extra computation into inference. However, limited research explores if the merging process can occur within one model and enhance the model's robustness, which is particularly critical in the medical image domain. In the paper, we are the first to propose in-model merging (InMerge), a novel approach that enhances the model's robustness by selectively merging similar convolutional kernels in the deep layers of a single convolutional neural network (CNN) during the training process for classification. We also analytically reveal important characteristics that affect how in-model merging should be performed, serving as an insightful reference for the community. We demonstrate the feasibility and effectiveness of this technique for different CNN architectures on 4 prevalent datasets. The proposed InMerge-trained model surpasses the typically-trained model by a substantial margin. The code will be made public.

IVAug 19, 2025
UNICON: UNIfied CONtinual Learning for Medical Foundational Models

Mohammad Areeb Qazi, Munachiso S Nwadike, Ibrahim Almakky et al.

Foundational models are trained on extensive datasets to capture the general trends of a domain. However, in medical imaging, the scarcity of data makes pre-training for every domain, modality, or task challenging. Continual learning offers a solution by fine-tuning a model sequentially on different domains or tasks, enabling it to integrate new knowledge without requiring large datasets for each training phase. In this paper, we propose UNIfied CONtinual Learning for Medical Foundational Models (UNICON), a framework that enables the seamless adaptation of foundation models to diverse domains, tasks, and modalities. Unlike conventional adaptation methods that treat these changes in isolation, UNICON provides a unified, perpetually expandable framework. Through careful integration, we show that foundation models can dynamically expand across imaging modalities, anatomical regions, and clinical objectives without catastrophic forgetting or task interference. Empirically, we validate our approach by adapting a chest CT foundation model initially trained for classification to a prognosis and segmentation task. Our results show improved performance across both additional tasks. Furthermore, we continually incorporated PET scans and achieved a 5\% improvement in Dice score compared to respective baselines. These findings establish that foundation models are not inherently constrained to their initial training scope but can evolve, paving the way toward generalist AI models for medical imaging.

IVJan 16, 2022
Self-Supervision and Multi-Task Learning: Challenges in Fine-Grained COVID-19 Multi-Class Classification from Chest X-rays

Muhammad Ridzuan, Ameera Ali Bawazir, Ivo Gollini Navarette et al.

Quick and accurate diagnosis is of paramount importance to mitigate the effects of COVID-19 infection, particularly for severe cases. Enormous effort has been put towards developing deep learning methods to classify and detect COVID-19 infections from chest radiography images. However, recently some questions have been raised surrounding the clinical viability and effectiveness of such methods. In this work, we investigate the impact of multi-task learning (classification and segmentation) on the ability of CNNs to differentiate between various appearances of COVID-19 infections in the lung. We also employ self-supervised pre-training approaches, namely MoCo and inpainting-CXR, to eliminate the dependence on expensive ground truth annotations for COVID-19 classification. Finally, we conduct a critical evaluation of the models to assess their deploy-readiness and provide insights into the difficulties of fine-grained COVID-19 multi-class classification from chest X-rays.

SPNov 22, 2020
Self-Supervised Transformers for Activity Classification using Ambient Sensors

Luke Hicks, Ariel Ruiz-Garcia, Vasile Palade et al.

Providing care for ageing populations is an onerous task, and as life expectancy estimates continue to rise, the number of people that require senior care is growing rapidly. This paper proposes a methodology based on Transformer Neural Networks to classify the activities of a resident within an ambient sensor based environment. We also propose a methodology to pre-train Transformers in a self-supervised manner, as a hybrid autoencoder-classifier model instead of using contrastive loss. The social impact of the research is considered with wider benefits of the approach and next steps for identifying transitions in human behaviour. In recent years there has been an increasing drive for integrating sensor based technologies within care facilities for data collection. This allows for employing machine learning for many aspects including activity recognition and anomaly detection. Due to the sensitivity of healthcare environments, some methods of data collection used in current research are considered to be intrusive within the senior care industry, including cameras for image based activity recognition, and wearables for activity tracking, but recent studies have shown that using these methods commonly result in poor data quality due to the lack of resident interest in participating in data gathering. This has led to a focus on ambient sensors, such as binary PIR motion, connected domestic appliances, and electricity and water metering. By having consistency in ambient data collection, the quality of data is considerably more reliable, presenting the opportunity to perform classification with enhanced accuracy. Therefore, in this research we looked to find an optimal way of using deep learning to classify human activity with ambient sensor data.

LGNov 22, 2020
Generative Adversarial Stacked Autoencoders

Ariel Ruiz-Garcia, Ibrahim Almakky, Vasile Palade et al.

Generative Adversarial Networks (GANs) have become predominant in image generation tasks. Their success is attributed to the training regime which employs two models: a generator G and discriminator D that compete in a minimax zero sum game. Nonetheless, GANs are difficult to train due to their sensitivity to hyperparameter and parameter initialisation, which often leads to vanishing gradients, non-convergence, or mode collapse, where the generator is unable to create samples with different variations. In this work, we propose a novel Generative Adversarial Stacked Convolutional Autoencoder(GASCA) model and a generative adversarial gradual greedy layer-wise learning algorithm de-signed to train Adversarial Autoencoders in an efficient and incremental manner. Our training approach produces images with significantly lower reconstruction error than vanilla joint training.