CLIROct 6, 2016

A Robust Framework for Classifying Evolving Document Streams in an Expert-Machine-Crowd Setting

arXiv:1610.01858v112 citations
Originality Incremental advance
AI Analysis

This addresses the problem of evolving document classification for social media analysts, but it appears incremental as it builds on existing triad and clustering methods.

The paper tackles the challenge of keeping categories up-to-date in online classification of social media data streams by proposing an Expert-Machine-Crowd framework, which uses COD-Means to detect novel concepts and improve categorization quality, showing effectiveness and efficiency in experiments on real datasets.

An emerging challenge in the online classification of social media data streams is to keep the categories used for classification up-to-date. In this paper, we propose an innovative framework based on an Expert-Machine-Crowd (EMC) triad to help categorize items by continuously identifying novel concepts in heterogeneous data streams often riddled with outliers. We unify constrained clustering and outlier detection by formulating a novel optimization problem: COD-Means. We design an algorithm to solve the COD-Means problem and show that COD-Means will not only help detect novel categories but also seamlessly discover human annotation errors and improve the overall quality of the categorization process. Experiments on diverse real data sets demonstrate that our approach is both effective and efficient.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes