Speak & Spell: LLM-Driven Controllable Phonetic Error Augmentation for Robust Dialogue State Tracking
This addresses robustness issues in task-oriented dialogue systems for users in noisy environments, but it is incremental as it builds on existing data augmentation techniques.
The paper tackles the problem of dialogue state tracking accuracy dropping in spoken dialogue due to ASR errors by introducing a controllable phonetic error augmentation method, resulting in improved accuracy in noisy and low-accuracy ASR environments.
Dialogue State Tracking (DST) is a key part of task-oriented dialogue systems, identifying important information in conversations. However, its accuracy drops significantly in spoken dialogue environments due to named entity errors from Automatic Speech Recognition (ASR) systems. We introduce a simple yet effective data augmentation method that targets those entities to improve the robustness of DST model. Our novel method can control the placement of errors using keyword-highlighted prompts while introducing phonetically similar errors. As a result, our method generated sufficient error patterns on keywords, leading to improved accuracy in noised and low-accuracy ASR environments.