CLAug 24, 2023

A Small and Fast BERT for Chinese Medical Punctuation Restoration

arXiv:2308.12568v42 citationsh-index: 11
Originality Incremental advance
AI Analysis

This addresses the need for fast and accurate punctuation restoration in clinical dictation, though it is incremental as it builds on existing pre-trained models.

The authors tackled the problem of automatic punctuation restoration in Chinese medical dictation to prevent misunderstandings in clinical reports, achieving 95% performance with only 10% model size compared to state-of-the-art models.

In clinical dictation, utterances after automatic speech recognition (ASR) without explicit punctuation marks may lead to the misunderstanding of dictated reports. To give a precise and understandable clinical report with ASR, automatic punctuation restoration is required. Considering a practical scenario, we propose a fast and light pre-trained model for Chinese medical punctuation restoration based on 'pretraining and fine-tuning' paradigm. In this work, we distill pre-trained models by incorporating supervised contrastive learning and a novel auxiliary pre-training task (Punctuation Mark Prediction) to make it well-suited for punctuation restoration. Our experiments on various distilled models reveal that our model can achieve 95% performance while 10% model size relative to state-of-the-art Chinese RoBERTa.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes