MAILEX: Email Event and Argument Extraction
This addresses the problem of extracting structured events from email conversations for domain-specific NLP applications, though it is incremental as it adapts existing extraction methods to a new domain.
The authors created MailEx, the first dataset for event extraction from conversational email threads, containing 1.5K threads with ~8K event instances, and found that current approaches struggle with challenges like non-continuous triggers and non-named entity arguments.
In this work, we present the first dataset, MailEx, for performing event extraction from conversational email threads. To this end, we first proposed a new taxonomy covering 10 event types and 76 arguments in the email domain. Our final dataset includes 1.5K email threads and ~4K emails, which are annotated with totally ~8K event instances. To understand the task challenges, we conducted a series of experiments comparing three types of approaches, i.e., fine-tuned sequence labeling, fine-tuned generative extraction, and few-shot in-context learning. Our results showed that the task of email event extraction is far from being addressed, due to challenges lying in, e.g., extracting non-continuous, shared trigger spans, extracting non-named entity arguments, and modeling the email conversational history. Our work thus suggests more future investigations in this domain-specific event extraction task.