DraDDP: A Multimodal Multi-Party Dialogue Discourse Parsing Dataset
This dataset addresses the lack of multimodal, multi-party resources for discourse parsing, enabling research in more realistic dialogue understanding.
The authors constructed DraDDP, the first publicly available English multimodal dataset for multi-party dialogue discourse parsing, containing 495 segments with 6,374 utterances and 9.1 hours of video. Benchmarks show multimodal information improves dialogue structure and relation type capture.
Multi-party dialogue discourse parsing aims to identify dependency structures and relation types between utterances in conversations. Previous studies are mostly limited to textual modality or two-party dialogue, failing to meet the multimodal and multi-party settings. In this paper, we construct the first publicly available English multimodal dataset DraDDP for multi-party dialogue discourse parsing, based on American TV dramas. DraDDP contains 495 dialogue segments with 6,374 utterances and 9.1 hours of parallel video content, covering rich multi-party interaction scenarios. Moreover, we establish comprehensive benchmarks by evaluating this task on DraDDP and conducting in-depth analysis on the impact of different modalities. Experimental results demonstrate the value of multimodal information in capturing dialogue structures and relation types. We will publicly release the dataset, annotation guidelines, and code to promote future research in multimodal dialogue understanding.