CLFeb 3, 2020

Learning Contextualized Document Representations for Healthcare Answer Retrieval

arXiv:2002.00835v115 citations
AI Analysis

This addresses the challenge of efficient answer retrieval for both patients and medical professionals from heterogeneous healthcare domains without requiring domain-specific fine-tuning.

The paper tackles the problem of retrieving relevant answer passages from long healthcare documents by introducing Contextual Discourse Vectors (CDV), a document representation method that significantly outperforms state-of-the-art baselines for healthcare passage ranking across nine public health resources.

We present Contextual Discourse Vectors (CDV), a distributed document representation for efficient answer retrieval from long healthcare documents. Our approach is based on structured query tuples of entities and aspects from free text and medical taxonomies. Our model leverages a dual encoder architecture with hierarchical LSTM layers and multi-task training to encode the position of clinical entities and aspects alongside the document discourse. We use our continuous representations to resolve queries with short latency using approximate nearest neighbor search on sentence level. We apply the CDV model for retrieving coherent answer passages from nine English public health resources from the Web, addressing both patients and medical professionals. Because there is no end-to-end training data available for all application scenarios, we train our model with self-supervised data from Wikipedia. We show that our generalized model significantly outperforms several state-of-the-art baselines for healthcare passage ranking and is able to adapt to heterogeneous domains without additional fine-tuning.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes