LGFeb 19
A feature-stable and explainable machine learning framework for trustworthy decision-making under incomplete clinical dataJustyna Andrys-Olek, Paulina Tworek, Luca Gherardini et al.
Machine learning models are increasingly applied to biomedical data, yet their adoption in high stakes domains remains limited by poor robustness, limited interpretability, and instability of learned features under realistic data perturbations, such as missingness. In particular, models that achieve high predictive performance may still fail to inspire trust if their key features fluctuate when data completeness changes, undermining reproducibility and downstream decision-making. Here, we present CACTUS (Comprehensive Abstraction and Classification Tool for Uncovering Structures), an explainable machine learning framework explicitly designed to address these challenges in small, heterogeneous, and incomplete clinical datasets. CACTUS integrates feature abstraction, interpretable classification, and systematic feature stability analysis to quantify how consistently informative features are preserved as data quality degrades. Using a real-world haematuria cohort comprising 568 patients evaluated for bladder cancer, we benchmark CACTUS against widely used machine learning approaches, including random forests and gradient boosting methods, under controlled levels of randomly introduced missing data. We demonstrate that CACTUS achieves competitive or superior predictive performance while maintaining markedly higher stability of top-ranked features as missingness increases, including in sex-stratified analyses. Our results show that feature stability provides information complementary to conventional performance metrics and is essential for assessing the trustworthiness of machine learning models applied to biomedical data. By explicitly quantifying robustness to missing data and prioritising interpretable, stable features, CACTUS offers a generalizable framework for trustworthy data-driven decision support.
AINov 19, 2025
Balancing Natural Language Processing Accuracy and Normalisation in Extracting Medical InsightsPaulina Tworek, Miłosz Bargieł, Yousef Khan et al.
Extracting structured medical insights from unstructured clinical text using Natural Language Processing (NLP) remains an open challenge in healthcare, particularly in non-English contexts where resources are scarce. This study presents a comparative analysis of NLP low-compute rule-based methods and Large Language Models (LLMs) for information extraction from electronic health records (EHR) obtained from the Voivodeship Rehabilitation Hospital for Children in Ameryka, Poland. We evaluate both approaches by extracting patient demographics, clinical findings, and prescribed medications while examining the effects of lack of text normalisation and translation-induced information loss. Results demonstrate that rule-based methods provide higher accuracy in information retrieval tasks, particularly for age and sex extraction. However, LLMs offer greater adaptability and scalability, excelling in drug name recognition. The effectiveness of the LLMs was compared with texts originally in Polish and those translated into English, assessing the impact of translation. These findings highlight the trade-offs between accuracy, normalisation, and computational cost when deploying NLP in healthcare settings. We argue for hybrid approaches that combine the precision of rule-based systems with the adaptability of LLMs, offering a practical path toward more reliable and resource-efficient clinical NLP in real-world hospitals.
LGJan 19, 2024
Preservation of Feature Stability in Machine Learning Under Data Uncertainty for Decision Support in Critical DomainsKarol Capała, Paulina Tworek, Jose Sousa
In a world where Machine Learning (ML) is increasingly deployed to support decision-making in critical domains, providing decision-makers with explainable, stable, and relevant inputs becomes fundamental. Understanding how machine learning works under missing data and how this affects feature variability is paramount. This is even more relevant as machine learning approaches focus on standardising decision-making approaches that rely on an idealised set of features. However, decision-making in human activities often relies on incomplete data, even in critical domains. This paper addresses this gap by conducting a set of experiments using traditional machine learning methods that look for optimal decisions in comparison to a recently deployed machine learning method focused on a classification that is more descriptive and mimics human decision making, allowing for the natural integration of explainability. We found that the ML descriptive approach maintains higher classification accuracy while ensuring the stability of feature selection as data incompleteness increases. This suggests that descriptive classification methods can be helpful in uncertain decision-making scenarios.