CLSDASApr 11, 2022

Building an ASR Error Robust Spoken Virtual Patient System in a Highly Class-Imbalanced Scenario Without Speech Data

arXiv:2204.05183v23 citationsh-index: 36
Originality Incremental advance
AI Analysis

This work addresses a critical problem for medical training by enhancing the robustness of virtual patient systems, though it is incremental as it builds on prior separate solutions.

The paper tackles the combined challenges of ASR errors and class imbalance in a spoken virtual patient system by developing a two-step training method that uses an ASR error predictor to simulate speech from text data, achieving significant improvements in intent classification across various word error rates.

A Virtual Patient (VP) is a powerful tool for training medical students to take patient histories, where responding to a diverse set of spoken questions is essential to simulate natural conversations with a student. The performance of such a Spoken Language Understanding system (SLU) can be adversely affected by both the presence of Automatic Speech Recognition (ASR) errors in the test data and a high degree of class imbalance in the SLU training data. While these two issues have been addressed separately in prior work, we develop a novel two-step training methodology that tackles both these issues effectively in a single dialog agent. As it is difficult to collect spoken data from users without a functioning SLU system, our method does not rely on spoken data for training, rather we use an ASR error predictor to "speechify" the text data. Our method shows significant improvements over strong baselines on the VP intent classification task at various word error rate settings.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes