CLJul 5, 2023

PULSAR at MEDIQA-Sum 2023: Large Language Models Augmented by Synthetic Dialogue Convert Patient Dialogues to Medical Records

Viktor Schlegel, Hao Li, Yuping Wu, Anand Subramanian, Thanh-Tung Nguyen, Abhinav Ramesh Kashyap, Daniel Beck, Xiaojun Zeng, Riza Theresa Batista-Navarro, Stefan Winkler, Goran Nenadic

Tencent

arXiv:2307.02006v14.914 citationsh-index: 35Has Code

Originality Synthesis-oriented

AI Analysis

This work addresses the need for automated medical record generation from dialogues, but it is incremental as it builds on existing methods with limited gains from the proposed augmentations.

The paper tackled the problem of summarizing patient-doctor dialogues into clinical records by developing PULSAR, a system that used domain-specific pre-training and synthetic data augmentation, but found that scaling up the language model provided the best performance, achieving second and third place among 13 submissions in the MEDIQA-Sum 2023 challenge.

This paper describes PULSAR, our system submission at the ImageClef 2023 MediQA-Sum task on summarising patient-doctor dialogues into clinical records. The proposed framework relies on domain-specific pre-training, to produce a specialised language model which is trained on task-specific natural data augmented by synthetic data generated by a black-box LLM. We find limited evidence towards the efficacy of domain-specific pre-training and data augmentation, while scaling up the language model yields the best performance gains. Our approach was ranked second and third among 13 submissions on task B of the challenge. Our code is available at https://github.com/yuping-wu/PULSAR.

View on arXiv PDF Code

Similar