CLAug 21, 2018

Lessons from Natural Language Inference in the Clinical Domain

arXiv:1808.06752v21180 citations
AI Analysis

This addresses the challenge of applying NLP models to knowledge-intensive clinical settings where data is scarce, though it is incremental as it adapts existing methods to a new domain.

The paper tackles the problem of poor generalization of deep neural networks in specialized domains with limited training data by introducing MedNLI, a doctor-annotated dataset for natural language inference in the clinical domain, and shows performance gains from transfer learning and domain knowledge incorporation.

State of the art models using deep neural networks have become very good in learning an accurate mapping from inputs to outputs. However, they still lack generalization capabilities in conditions that differ from the ones encountered during training. This is even more challenging in specialized, and knowledge intensive domains, where training data is limited. To address this gap, we introduce MedNLI - a dataset annotated by doctors, performing a natural language inference task (NLI), grounded in the medical history of patients. We present strategies to: 1) leverage transfer learning using datasets from the open domain, (e.g. SNLI) and 2) incorporate domain knowledge from external data and lexical sources (e.g. medical terminologies). Our results demonstrate performance gains using both strategies.

Code Implementations3 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes