Exploring Long-Term Prediction of Type 2 Diabetes Microvascular Complications
This work addresses the challenge of integrating diverse clinical data for healthcare providers, though it is incremental as it builds on existing code-agnostic methods.
The study tackled the problem of predicting long-term microvascular complications in Type 2 Diabetes using electronic healthcare records by employing a code-agnostic text representation approach with clinical language models, and found that it outperformed code-based models, with better performance at longer prediction windows but bias toward the first complication.
Electronic healthcare records (EHR) contain a huge wealth of data that can support the prediction of clinical outcomes. EHR data is often stored and analysed using clinical codes (ICD10, SNOMED), however these can differ across registries and healthcare providers. Integrating data across systems involves mapping between different clinical ontologies requiring domain expertise, and at times resulting in data loss. To overcome this, code-agnostic models have been proposed. We assess the effectiveness of a code-agnostic representation approach on the task of long-term microvascular complication prediction for individuals living with Type 2 Diabetes. Our method encodes individual EHRs as text using fine-tuned, pretrained clinical language models. Leveraging large-scale EHR data from the UK, we employ a multi-label approach to simultaneously predict the risk of microvascular complications across 1-, 5-, and 10-year windows. We demonstrate that a code-agnostic approach outperforms a code-based model and illustrate that performance is better with longer prediction windows but is biased to the first occurring complication. Overall, we highlight that context length is vitally important for model performance. This study highlights the possibility of including data from across different clinical ontologies and is a starting point for generalisable clinical models.