Seq-2-Seq based Refinement of ASR Output for Spoken Name Capture
This work addresses the specific challenge of spoken name capture for automated systems, representing an incremental improvement over existing methods.
The paper tackles the problem of capturing person names from speech in human-machine conversations by proposing a lightweight Seq-2-Seq system that generates name spells from varying user inputs, outperforming a strong LM-driven rule-based baseline.
Person name capture from human speech is a difficult task in human-machine conversations. In this paper, we propose a novel approach to capture the person names from the caller utterances in response to the prompt "say and spell your first/last name". Inspired from work on spell correction, disfluency removal and text normalization, we propose a lightweight Seq-2-Seq system which generates a name spell from a varying user input. Our proposed method outperforms the strong baseline which is based on LM-driven rule-based approach.