IR CL LGJan 16, 2014

Which Clustering Do You Want? Inducing Your Ideal Clustering with Minimal Feedback

arXiv:1401.5389v130 citations

Originality Incremental advance

AI Analysis

This addresses the need for customizable clustering in text analysis, but it is incremental as it builds on prior interactive methods with reduced human effort.

The paper tackles the problem of clustering documents along user-specified dimensions (e.g., mood, gender) rather than default topics, by proposing an active clustering algorithm that requires minimal user feedback via word inspection. It demonstrates viability on sentiment datasets, though no concrete performance numbers are provided.

While traditional research on text clustering has largely focused on grouping documents by topic, it is conceivable that a user may want to cluster documents along other dimensions, such as the authors mood, gender, age, or sentiment. Without knowing the users intention, a clustering algorithm will only group documents along the most prominent dimension, which may not be the one the user desires. To address the problem of clustering documents along the user-desired dimension, previous work has focused on learning a similarity metric from data manually annotated with the users intention or having a human construct a feature space in an interactive manner during the clustering process. With the goal of reducing reliance on human knowledge for fine-tuning the similarity function or selecting the relevant features required by these approaches, we propose a novel active clustering algorithm, which allows a user to easily select the dimension along which she wants to cluster the documents by inspecting only a small number of words. We demonstrate the viability of our algorithm on a variety of commonly-used sentiment datasets.

View on arXiv PDF

Similar