CLJun 5, 2024

Generating Harder Cross-document Event Coreference Resolution Datasets using Metaphoric Paraphrasing

Shafiuddin Rehan Ahmed, Zhiyong Eric Wang, George Arthur Baker, Kevin Stowe, James H. Martin

arXiv:2407.11988v115.430 citationsHas Code

Originality Incremental advance

AI Analysis

This addresses a critical bottleneck for researchers in NLP by providing a harder dataset to improve event coreference resolution, especially for figurative language, though it is incremental as it builds on an existing dataset.

The authors tackled the lack of lexical diversity and figurative language in cross-document event coreference resolution datasets by creating ECB+META, a variant of ECB+ using ChatGPT for metaphoric paraphrasing, which existing methods struggle with, showing performance drops on this more challenging dataset.

The most popular Cross-Document Event Coreference Resolution (CDEC) datasets fail to convey the true difficulty of the task, due to the lack of lexical diversity between coreferring event triggers (words or phrases that refer to an event). Furthermore, there is a dearth of event datasets for figurative language, limiting a crucial avenue of research in event comprehension. We address these two issues by introducing ECB+META, a lexically rich variant of Event Coref Bank Plus (ECB+) for CDEC on symbolic and metaphoric language. We use ChatGPT as a tool for the metaphoric transformation of sentences in the documents of ECB+, then tag the original event triggers in the transformed sentences in a semi-automated manner. In this way, we avoid the re-annotation of expensive coreference links. We present results that show existing methods that work well on ECB+ struggle with ECB+META, thereby paving the way for CDEC research on a much more challenging dataset. Code/data: https://github.com/ahmeshaf/llms_coref

View on arXiv PDF Code

Similar