CLApr 1, 2022

PriMock57: A Dataset Of Primary Care Mock Consultations

arXiv:2204.00333v1655 citationsh-index: 10
Originality Synthesis-oriented
AI Analysis

This provides a resource for researchers in medical AI to overcome privacy barriers, though it is incremental as it offers a new dataset rather than a novel method.

The authors tackled the problem of restricted access to clinical datasets by creating PriMock57, a public dataset of 57 mocked primary care consultations with audio, transcriptions, and notes, enabling benchmarking for medical ASR and note generation.

Recent advances in Automatic Speech Recognition (ASR) have made it possible to reliably produce automatic transcripts of clinician-patient conversations. However, access to clinical datasets is heavily restricted due to patient privacy, thus slowing down normal research practices. We detail the development of a public access, high quality dataset comprising of57 mocked primary care consultations, including audio recordings, their manual utterance-level transcriptions, and the associated consultation notes. Our work illustrates how the dataset can be used as a benchmark for conversational medical ASR as well as consultation note generation from transcripts.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes