CLSep 28, 2022

Clinical Language Understanding Evaluation (CLUE)

arXiv:2209.14377v10.31 citationsh-index: 55

Originality Synthesis-oriented

AI Analysis

This addresses the difficulty in comparing approaches for clinical language processing tasks like disease phenotyping and mortality prediction, though it is incremental as it standardizes existing evaluation practices.

The authors tackled the problem of inconsistent evaluation in clinical language processing by creating the Clinical Language Understanding Evaluation (CLUE) benchmark, which provides standardized tasks, data from MIMIC, and a software toolkit to enable direct comparison and improve reproducibility.

Clinical language processing has received a lot of attention in recent years, resulting in new models or methods for disease phenotyping, mortality prediction, and other tasks. Unfortunately, many of these approaches are tested under different experimental settings (e.g., data sources, training and testing splits, metrics, evaluation criteria, etc.) making it difficult to compare approaches and determine state-of-the-art. To address these issues and facilitate reproducibility and comparison, we present the Clinical Language Understanding Evaluation (CLUE) benchmark with a set of four clinical language understanding tasks, standard training, development, validation and testing sets derived from MIMIC data, as well as a software toolkit. It is our hope that these data will enable direct comparison between approaches, improve reproducibility, and reduce the barrier-to-entry for developing novel models or methods for these clinical language understanding tasks.

View on arXiv PDF

Similar