CLLGApr 19, 2022

Unsupervised Numerical Reasoning to Extract Phenotypes from Clinical Text by Leveraging External Knowledge

arXiv:2204.10202v13 citationsh-index: 60
Originality Incremental advance
AI Analysis

This addresses a bottleneck in clinical phenotyping for healthcare applications, enabling better identification of patients with conditions like rare diseases, though it is incremental as it builds on existing models.

The paper tackles the problem of extracting phenotypes from clinical text that require numerical reasoning, such as interpreting temperature values for fever, by presenting an unsupervised methodology that leverages external knowledge and ClinicalBERT embeddings. It shows substantial performance improvements, with absolute gains in generalized Recall and F1 scores up to 79% and 71% in unsupervised settings, and up to 70% and 44% in supervised settings.

Extracting phenotypes from clinical text has been shown to be useful for a variety of clinical use cases such as identifying patients with rare diseases. However, reasoning with numerical values remains challenging for phenotyping in clinical text, for example, temperature 102F representing Fever. Current state-of-the-art phenotyping models are able to detect general phenotypes, but perform poorly when they detect phenotypes requiring numerical reasoning. We present a novel unsupervised methodology leveraging external knowledge and contextualized word embeddings from ClinicalBERT for numerical reasoning in a variety of phenotypic contexts. Comparing against unsupervised benchmarks, it shows a substantial performance improvement with absolute gains on generalized Recall and F1 scores up to 79% and 71%, respectively. In the supervised setting, it also surpasses the performance of alternative approaches with absolute gains on generalized Recall and F1 scores up to 70% and 44%, respectively.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes