Kamil Mikolaj

CV
h-index26
6papers
87citations
Novelty40%
AI Score42

6 Papers

IVSep 20, 2022
Generalisability of fetal ultrasound deep learning models to low-resource imaging settings in five African countries

Carla Sendra-Balcells, Víctor M. Campello, Jordina Torrents-Barrena et al.

Most artificial intelligence (AI) research have concentrated in high-income countries, where imaging data, IT infrastructures and clinical expertise are plentiful. However, slower progress has been made in limited-resource environments where medical imaging is needed. For example, in Sub-Saharan Africa the rate of perinatal mortality is very high due to limited access to antenatal screening. In these countries, AI models could be implemented to help clinicians acquire fetal ultrasound planes for diagnosis of fetal abnormalities. So far, deep learning models have been proposed to identify standard fetal planes, but there is no evidence of their ability to generalise in centres with limited access to high-end ultrasound equipment and data. This work investigates different strategies to reduce the domain-shift effect for a fetal plane classification model trained on a high-resource clinical centre and transferred to a new low-resource centre. To that end, a classifier trained with 1,792 patients from Spain is first evaluated on a new centre in Denmark in optimal conditions with 1,008 patients and is later optimised to reach the same performance in five African centres (Egypt, Algeria, Uganda, Ghana and Malawi) with 25 patients each. The results show that a transfer learning approach can be a solution to integrate small-size African samples with existing large-scale databases in developed countries. In particular, the model can be re-aligned and optimised to boost the performance on African populations by increasing the recall to $0.92\pm0.04$ and at the same time maintaining a high precision across centres. This framework shows promise for building new AI models generalisable across clinical centres with limited data acquired in challenging and heterogeneous conditions and calls for further research to develop new solutions for usability of AI in countries with less resources.

CVMar 24, 2023
Removing confounding information from fetal ultrasound images

Kamil Mikolaj, Manxi Lin, Zahra Bashir et al.

Confounding information in the form of text or markings embedded in medical images can severely affect the training of diagnostic deep learning algorithms. However, data collected for clinical purposes often have such markings embedded in them. In dermatology, known examples include drawings or rulers that are overrepresented in images of malignant lesions. In this paper, we encounter text and calipers placed on the images found in national databases containing fetal screening ultrasound scans, which correlate with standard planes to be predicted. In order to utilize the vast amounts of data available in these databases, we develop and validate a series of methods for minimizing the confounding effects of embedded text and calipers on deep learning algorithms designed for ultrasound, using standard plane classification as a test case.

CVNov 19, 2022
Explainable fetal ultrasound quality assessment with progressive concept bottleneck models

Manxi Lin, Aasa Feragen, Kamil Mikolaj et al.

The quality of fetal ultrasound screening scans directly influences the precision of biometric measurements. However, acquiring high-quality scans is labor-intensive and highly relies on the operator's skills. Considering the low contrastiveness and imaging artifacts that widely exist in ultrasound, even a dedicated deep-learning model can be vulnerable to learning from confounding information in the image. In this paper, we propose a holistic and explainable method for fetal ultrasound quality assessment, where we design a hierarchical concept bottleneck model by introducing human-readable ``concepts" into the task and imitating the sequential expert decision-making process. This hierarchical information flow forces the model to learn concepts from semantically meaningful areas: The model first passes through a layer of visual, segmentation-based concepts, and next a second layer of property concepts directly associated with the decision-making task. We consider the quality assessment to be in a more challenging but more realistic setting, with fine-grained image recognition. Experiments show that our model outperforms equivalent concept-free models on an in-house dataset, and shows better generalizability on two public benchmarks, one from Spain and one from Africa, without any fine-tuning.

IVMar 11, 2024Code
Shortcut Learning in Medical Image Segmentation

Manxi Lin, Nina Weng, Kamil Mikolaj et al.

Shortcut learning is a phenomenon where machine learning models prioritize learning simple, potentially misleading cues from data that do not generalize well beyond the training set. While existing research primarily investigates this in the realm of image classification, this study extends the exploration of shortcut learning into medical image segmentation. We demonstrate that clinical annotations such as calipers, and the combination of zero-padded convolutions and center-cropped training sets in the dataset can inadvertently serve as shortcuts, impacting segmentation accuracy. We identify and evaluate the shortcut learning on two different but common medical image segmentation tasks. In addition, we suggest strategies to mitigate the influence of shortcut learning and improve the generalizability of the segmentation models. By uncovering the presence and implications of shortcuts in medical image segmentation, we provide insights and methodologies for evaluating and overcoming this pervasive challenge and call for attention in the community for shortcuts in segmentation. Our code is public at https://github.com/nina-weng/shortcut_skinseg .

LGMay 1
A Framework for Exploring and Disentangling Intersectional Bias: A Case Study in Fetal Ultrasound

Aya Elgebaly, Joris Fournel, Benjamin Laine Jønch Jurgensen et al.

Bias in medical AI is often framed as a problem of representation. However, in image-based tasks such as fetal ultrasound, performance disparities can arise even when representation is adequate, because predictive accuracy depends strongly on image quality. Image quality is shaped by acquisition conditions and operator expertise, as well as patient-dependent factors such as maternal body mass index (BMI), all of which may correlate with sensitive demographic features. Consequently, observed disparities may reflect the combined influence of demographic, clinical, and acquisition-related factors rather than data imbalance alone, and may obscure underlying interaction or confounding effects. We propose a structured framework to explore and detect intersectional bias, combining unsupervised slice discovery, systematic factor-wise analysis, and targeted intersectional evaluation. In a case study of over 94{,}000 ultrasound images for fetal weight estimation, we analyze bias in a state-of-the-art deep learning (DL) model and the clinical standard Hadlock, a regression formula using biometric measurements. Pixel spacing (PS) -- a parameter considered suboptimal in current acquisition protocols -- emerged as a consistent driver of performance differences, with higher PS associated with improvements of up to 24\% in selected subgroups for both models. Because PS is often adapted in cases of high BMI or low gestational age (GA), this effect carries a substantial risk of confounding. Our intersectional analysis revealed that part of the PS-associated signal is explained by GA, while PS-related improvements persist across BMI strata, highlighting the importance of acquisition-aware and interaction-aware evaluation in medical AI fairness research.

CVJun 24, 2025
General Methods Make Great Domain-specific Foundation Models: A Case-study on Fetal Ultrasound

Jakob Ambsdorf, Asbjørn Munk, Sebastian Llambias et al.

With access to large-scale, unlabeled medical datasets, researchers are confronted with two questions: Should they attempt to pretrain a custom foundation model on this medical data, or use transfer-learning from an existing generalist model? And, if a custom model is pretrained, are novel methods required? In this paper we explore these questions by conducting a case-study, in which we train a foundation model on a large regional fetal ultrasound dataset of 2M images. By selecting the well-established DINOv2 method for pretraining, we achieve state-of-the-art results on three fetal ultrasound datasets, covering data from different countries, classification, segmentation, and few-shot tasks. We compare against a series of models pretrained on natural images, ultrasound images, and supervised baselines. Our results demonstrate two key insights: (i) Pretraining on custom data is worth it, even if smaller models are trained on less data, as scaling in natural image pretraining does not translate to ultrasound performance. (ii) Well-tuned methods from computer vision are making it feasible to train custom foundation models for a given medical domain, requiring no hyperparameter tuning and little methodological adaptation. Given these findings, we argue that a bias towards methodological innovation should be avoided when developing domain specific foundation models under common computational resource constraints.