IR CLAug 2, 2021

Self-supervised Answer Retrieval on Clinical Notes

Paul Grundmann, Sebastian Arnold, Alexander Löser

arXiv:2108.00775v16.33 citations

Originality Incremental advance

AI Analysis

This addresses the challenge of domain-specific passage retrieval in clinical settings for doctors, though it appears incremental as it builds on existing Transformer architectures with a new training objective.

The paper tackles the problem of retrieving answer passages from clinical notes for patient cohort identification by introducing CAPR, a rule-based self-supervision objective for training Transformer models, and reports that it outperforms strong baselines on MIMIC-III and other healthcare datasets while generalizing effectively in zero-shot scenarios.

Retrieving answer passages from long documents is a complex task requiring semantic understanding of both discourse and document context. We approach this challenge specifically in a clinical scenario, where doctors retrieve cohorts of patients based on diagnoses and other latent medical aspects. We introduce CAPR, a rule-based self-supervision objective for training Transformer language models for domain-specific passage matching. In addition, we contribute a novel retrieval dataset based on clinical notes to simulate this scenario on a large corpus of clinical notes. We apply our objective in four Transformer-based architectures: Contextual Document Vectors, Bi-, Poly- and Cross-encoders. From our extensive evaluation on MIMIC-III and three other healthcare datasets, we report that CAPR outperforms strong baselines in the retrieval of domain-specific passages and effectively generalizes across rule-based and human-labeled passages. This makes the model powerful especially in zero-shot scenarios where only limited training data is available.

View on arXiv PDF

Similar