CLCYMay 17, 2018

Annotating Electronic Medical Records for Question Answering

arXiv:1805.06816v117 citations
Originality Synthesis-oriented
AI Analysis

This addresses the problem of developing machine learning systems for patient-specific medical queries, though it is incremental as it focuses on dataset creation rather than novel QA methods.

The paper tackles the lack of datasets for question answering on electronic health records by creating a replicable annotation process, resulting in a dataset of 5696 questions over 71 patient records with 1747 answers and high inter-annotator agreement (0.71 Cohen's kappa).

Our research is in the relatively unexplored area of question answering technologies for patient-specific questions over their electronic health records. A large dataset of human expert curated question and answer pairs is an important pre-requisite for developing, training and evaluating any question answering system that is powered by machine learning. In this paper, we describe a process for creating such a dataset of questions and answers. Our methodology is replicable, can be conducted by medical students as annotators, and results in high inter-annotator agreement (0.71 Cohen's kappa). Over the course of 11 months, 11 medical students followed our annotation methodology, resulting in a question answering dataset of 5696 questions over 71 patient records, of which 1747 questions have corresponding answers generated by the medical students.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes