CL LGJun 2, 2021

MedNLI Is Not Immune: Natural Language Inference Artifacts in the Clinical Domain

arXiv:2106.01491v131.7717 citationsHas Code

Originality Synthesis-oriented

AI Analysis

This work highlights dataset quality issues in the clinical domain, which could mislead model evaluation and development, though it is incremental as it extends prior findings to a new dataset.

The study investigated whether MedNLI, a clinical natural language inference dataset, contains statistical artifacts similar to those in crowdworker-constructed datasets, and found that it does, with specific patterns in hypotheses leading to performance degradation on a difficult subset.

Crowdworker-constructed natural language inference (NLI) datasets have been found to contain statistical artifacts associated with the annotation process that allow hypothesis-only classifiers to achieve better-than-random performance (Poliak et al., 2018; Gururanganet et al., 2018; Tsuchiya, 2018). We investigate whether MedNLI, a physician-annotated dataset with premises extracted from clinical notes, contains such artifacts (Romanov and Shivade, 2018). We find that entailed hypotheses contain generic versions of specific concepts in the premise, as well as modifiers related to responsiveness, duration, and probability. Neutral hypotheses feature conditions and behaviors that co-occur with, or cause, the condition(s) in the premise. Contradiction hypotheses feature explicit negation of the premise and implicit negation via assertion of good health. Adversarial filtering demonstrates that performance degrades when evaluated on the difficult subset. We provide partition information and recommendations for alternative dataset construction strategies for knowledge-intensive domains.

View on arXiv PDF Code

Similar