IRAICLApr 30, 2016

An Improved System for Sentence-level Novelty Detection in Textual Streams

arXiv:1605.00122v14 citations
Originality Incremental advance
AI Analysis

This addresses the challenge of detecting novel events in unpredictable web data streams, though it appears incremental as it builds on existing TF-IDF and LSH methods.

The paper tackles the problem of novelty detection in large textual streams from the web by presenting a system that adapts to new terms, achieving a 16% improvement in miss probability over a baseline on a Google News dataset.

Novelty detection in news events has long been a difficult problem. A number of models performed well on specific data streams but certain issues are far from being solved, particularly in large data streams from the WWW where unpredictability of new terms requires adaptation in the vector space model. We present a novel event detection system based on the Incremental Term Frequency-Inverse Document Frequency (TF-IDF) weighting incorporated with Locality Sensitive Hashing (LSH). Our system could efficiently and effectively adapt to the changes within the data streams of any new terms with continual updates to the vector space model. Regarding miss probability, our proposed novelty detection framework outperforms a recognised baseline system by approximately 16% when evaluating a benchmark dataset from Google News.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes