CLMMJan 27, 2024

Towards Event Extraction from Speech with Contextual Clues

arXiv:2401.15385v13 citationsh-index: 44Has Code
Originality Incremental advance
AI Analysis

This addresses the challenge of event extraction from continuous speech signals for applications in audio processing and natural language understanding, representing a novel but incremental advancement.

The paper tackles the under-explored problem of extracting semantic events directly from speech, introducing the Speech Event Extraction (SpeechEE) task with synthetic and human-spoken datasets. Their method achieves a maximum F1 gain of 10.7% over baselines.

While text-based event extraction has been an active research area and has seen successful application in many domains, extracting semantic events from speech directly is an under-explored problem. In this paper, we introduce the Speech Event Extraction (SpeechEE) task and construct three synthetic training sets and one human-spoken test set. Compared to event extraction from text, SpeechEE poses greater challenges mainly due to complex speech signals that are continuous and have no word boundaries. Additionally, unlike perceptible sound events, semantic events are more subtle and require a deeper understanding. To tackle these challenges, we introduce a sequence-to-structure generation paradigm that can produce events from speech signals in an end-to-end manner, together with a conditioned generation method that utilizes speech recognition transcripts as the contextual clue. We further propose to represent events with a flat format to make outputs more natural language-like. Our experimental results show that our method brings significant improvements on all datasets, achieving a maximum F1 gain of 10.7%. The code and datasets are released on https://github.com/jodie-kang/SpeechEE.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes