CLJul 11, 2023

Improved POS tagging for spontaneous, clinical speech using data augmentation

arXiv:2307.05796v11 citationsh-index: 68
Originality Incremental advance
AI Analysis

This work addresses the challenge of accurate POS tagging for clinical speech analysis, which is important for healthcare applications, but it is incremental as it builds on existing methods with data augmentation.

The paper tackled the problem of improving part-of-speech (POS) tagging for transcripts of spontaneous clinical speech from patients with neurodegenerative conditions, by using data augmentation on an out-of-domain newswire treebank to make it resemble natural speech, and achieved improved performance as tested on manually validated clinical speech data.

This paper addresses the problem of improving POS tagging of transcripts of speech from clinical populations. In contrast to prior work on parsing and POS tagging of transcribed speech, we do not make use of an in domain treebank for training. Instead, we train on an out of domain treebank of newswire using data augmentation techniques to make these structures resemble natural, spontaneous speech. We trained a parser with and without the augmented data and tested its performance using manually validated POS tags in clinical speech produced by patients with various types of neurodegenerative conditions.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes