IRCLJun 26, 2025

Cohort Retrieval using Dense Passage Retrieval

arXiv:2507.01049v1
Originality Synthesis-oriented
AI Analysis

This work addresses cohort retrieval for medical researchers and clinicians in the echocardiography domain, but it is incremental as it applies an existing method to a new medical domain.

The paper tackled patient cohort retrieval in echocardiography by applying Dense Passage Retrieval (DPR) to unstructured EHR data, resulting in a custom-trained model that demonstrated superior performance compared to traditional and off-the-shelf state-of-the-art methods.

Patient cohort retrieval is a pivotal task in medical research and clinical practice, enabling the identification of specific patient groups from extensive electronic health records (EHRs). In this work, we address the challenge of cohort retrieval in the echocardiography domain by applying Dense Passage Retrieval (DPR), a prominent methodology in semantic search. We propose a systematic approach to transform an echocardiographic EHR dataset of unstructured nature into a Query-Passage dataset, framing the problem as a Cohort Retrieval task. Additionally, we design and implement evaluation metrics inspired by real-world clinical scenarios to rigorously test the models across diverse retrieval tasks. Furthermore, we present a custom-trained DPR embedding model that demonstrates superior performance compared to traditional and off-the-shelf SOTA methods.To our knowledge, this is the first work to apply DPR for patient cohort retrieval in the echocardiography domain, establishing a framework that can be adapted to other medical domains.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes