CLFeb 20, 2018

TAP-DLND 1.0 : A Corpus for Document Level Novelty Detection

Tirthankar Ghosal, Amitra Salam, Swati Tiwari, Asif Ekbal, Pushpak Bhattacharyya

arXiv:1802.06950v132.01089 citations

Originality Synthesis-oriented

AI Analysis

This provides a resource for evaluating novelty detection techniques in NLP applications like summarization and news tracking, but it is incremental as it focuses on dataset creation rather than a novel method.

The authors tackled the lack of a benchmark dataset for document-level novelty detection by creating TAP-DLND 1.0, a corpus built from event-specific news crawling across domains, and demonstrated its utility with a developed system.

Detecting novelty of an entire document is an Artificial Intelligence (AI) frontier problem that has widespread NLP applications, such as extractive document summarization, tracking development of news events, predicting impact of scholarly articles, etc. Important though the problem is, we are unaware of any benchmark document level data that correctly addresses the evaluation of automatic novelty detection techniques in a classification framework. To bridge this gap, we present here a resource for benchmarking the techniques for document level novelty detection. We create the resource via event-specific crawling of news documents across several domains in a periodic manner. We release the annotated corpus with necessary statistics and show its use with a developed system for the problem in concern.

View on arXiv PDF

Similar