CL AIFeb 21, 2025

Med-gte-hybrid: A contextual embedding transformer model for extracting actionable information from clinical texts

Aditya Kumar, Simon Rauch, Mario Cypko, Oliver Amft

arXiv:2502.15996v21 citationsh-index: 1

Originality Incremental advance

AI Analysis

This work addresses the challenge of processing clinical narratives for healthcare applications, offering a method that could enhance clinical decision-making and personalized treatment, though it appears incremental as it builds on existing sentence transformers with hybrid tuning.

The paper tackled the problem of extracting actionable information from unstructured clinical texts by introducing a novel contextual embedding model, med-gte-hybrid, which outperformed state-of-the-art models on the Massive Text Embedding Benchmark and improved patient stratification, clustering, and text retrieval in clinical prediction tasks.

We introduce a novel contextual embedding model med-gte-hybrid that was derived from the gte-large sentence transformer to extract information from unstructured clinical narratives. Our model tuning strategy for med-gte-hybrid combines contrastive learning and a denoising autoencoder. To evaluate the performance of med-gte-hybrid, we investigate several clinical prediction tasks in large patient cohorts extracted from the MIMIC-IV dataset, including Chronic Kidney Disease (CKD) patient prognosis, estimated glomerular filtration rate (eGFR) prediction, and patient mortality prediction. Furthermore, we demonstrate that the med-gte-hybrid model improves patient stratification, clustering, and text retrieval, thus outperforms current state-of-the-art models on the Massive Text Embedding Benchmark (MTEB). While some of our evaluations focus on CKD, our hybrid tuning of sentence transformers could be transferred to other medical domains and has the potential to improve clinical decision-making and personalised treatment pathways in various healthcare applications.

View on arXiv PDF

Similar