Growing Trees on Sounds: Assessing Strategies for End-to-End Dependency Parsing of Speech
This work addresses the challenge of syntactic parsing for spoken language processing, offering an incremental improvement by comparing paradigms on realistic French conversations.
The study tackled the problem of direct dependency parsing from speech signals to incorporate prosodic information and avoid pipeline limitations, finding that a graph-based approach outperformed sequence labeling and direct parsing beat a pipeline method with 30% fewer parameters.
Direct dependency parsing of the speech signal -- as opposed to parsing speech transcriptions -- has recently been proposed as a task (Pupier et al. 2022), as a way of incorporating prosodic information in the parsing system and bypassing the limitations of a pipeline approach that would consist of using first an Automatic Speech Recognition (ASR) system and then a syntactic parser. In this article, we report on a set of experiments aiming at assessing the performance of two parsing paradigms (graph-based parsing and sequence labeling based parsing) on speech parsing. We perform this evaluation on a large treebank of spoken French, featuring realistic spontaneous conversations. Our findings show that (i) the graph based approach obtain better results across the board (ii) parsing directly from speech outperforms a pipeline approach, despite having 30% fewer parameters.