Beyond The Wall Street Journal: Anchoring and Comparing Discourse Signals across Genres
This work addresses the need for broader genre coverage in discourse analysis, though it is incremental as it extends existing methods to new data.
The paper tackled the limitation of existing discourse relation corpora being confined to the news domain by adapting a signal identification scheme to three additional genres, analyzing signaling device distributions and providing a taxonomy of indicative signals.
Recent research on discourse relations has found that they are cued not only by discourse markers (DMs) but also by other textual signals and that signaling information is indicative of genres. While several corpora exist with discourse relation signaling information such as the Penn Discourse Treebank (PDTB, Prasad et al. 2008) and the Rhetorical Structure Theory Signalling Corpus (RST-SC, Das and Taboada 2018), they both annotate the Wall Street Journal (WSJ) section of the Penn Treebank (PTB, Marcus et al. 1993), which is limited to the news domain. Thus, this paper adapts the signal identification and anchoring scheme (Liu and Zeldes, 2019) to three more genres, examines the distribution of signaling devices across relations and genres, and provides a taxonomy of indicative signals found in this dataset.