Semi-supervised News Discourse Profiling with Contrastive Learning
This addresses a data scarcity issue in news discourse analysis, which is useful for downstream applications, but it is incremental as it applies semi-supervised learning to an existing task.
The paper tackles the problem of limited annotated data for news discourse profiling by proposing a semi-supervised approach using contrastive learning, achieving effective results as demonstrated in evaluation.
News Discourse Profiling seeks to scrutinize the event-related role of each sentence in a news article and has been proven useful across various downstream applications. Specifically, within the context of a given news discourse, each sentence is assigned to a pre-defined category contingent upon its depiction of the news event structure. However, existing approaches suffer from an inadequacy of available human-annotated data, due to the laborious and time-intensive nature of generating discourse-level annotations. In this paper, we present a novel approach, denoted as Intra-document Contrastive Learning with Distillation (ICLD), for addressing the news discourse profiling task, capitalizing on its unique structural characteristics. Notably, we are the first to apply a semi-supervised methodology within this task paradigm, and evaluation demonstrates the effectiveness of the presented approach.