IR DLFeb 27, 2017

Mutual Information based labelling and comparing clusters

arXiv:1702.08199v129 citations

Originality Synthesis-oriented

AI Analysis

This work addresses the need for interpretable clustering results in document analysis, though it is incremental as it applies an existing information-theoretic measure to a specific labeling task.

The paper tackles the problem of labeling automatically generated clusters of journal articles by proposing a method that selects topical terms with the highest Normalized Mutual Information (NMI) as labels, and it results in labels that are both lexically and semantically discriminating, as validated by domain expert discussion.

After a clustering solution is generated automatically, labelling these clusters becomes important to help understanding the results. In this paper, we propose to use a Mutual Information based method to label clusters of journal articles. Topical terms which have the highest Normalised Mutual Information (NMI) with a certain cluster are selected to be the labels of the cluster. Discussion of the labelling technique with a domain expert was used as a check that the labels are discriminating not only lexical-wise but also semantically. Based on a common set of topical terms, we also propose to generate lexical fingerprints as a representation of individual clusters. Eventually, we visualise and compare these fingerprints of different clusters from either one clustering solution or different ones.

View on arXiv PDF

Similar