LGJan 12

Computing patient similarity based on unstructured clinical notes

Petr Zelina, Marko Řeháček, Jana Halámková, Lucia Bohovicová, Martin Rusinko, Vít Nováček

arXiv:2601.07385v11.41 citationsh-index: 18TSD

Originality Incremental advance

AI Analysis

This work addresses the challenge of exploiting unstructured clinical notes for precision medicine, offering a method to improve patient similarity analysis for healthcare applications, though it appears incremental as it builds on existing embedding and matrix techniques.

The paper tackled the problem of computing patient similarity from unstructured clinical notes by introducing a method that represents patients as matrices from aggregated embeddings, enabling robust similarity computation. The results demonstrated its usefulness for downstream tasks like personalized therapy recommendations, using clinical notes from 4,267 breast-cancer patients and expert labels.

Clinical notes hold rich yet unstructured details about diagnoses, treatments, and outcomes that are vital to precision medicine but hard to exploit at scale. We introduce a method that represents each patient as a matrix built from aggregated embeddings of all their notes, enabling robust patient similarity computation based on their latent low-rank representations. Using clinical notes of 4,267 Czech breast-cancer patients and expert similarity labels from Masaryk Memorial Cancer Institute, we evaluate several matrix-based similarity measures and analyze their strengths and limitations across different similarity facets, such as clinical history, treatment, and adverse events. The results demonstrate the usefulness of the presented method for downstream tasks, such as personalized therapy recommendations or toxicity warnings.

View on arXiv PDF

Similar