LGMar 23, 2021

Pairwise Adjusted Mutual Information

arXiv:2103.12641v115 citations
Originality Incremental advance
AI Analysis

This addresses the computational bottleneck for researchers and practitioners using clustering evaluation metrics, though it is an incremental improvement over existing methods.

The paper tackles the computational expense of adjusted mutual information (AMI) for clustering similarity by proposing a novel adjustment based on pairwise label permutations instead of full permutations. The result is a metric with similar quality to standard AMI but much lower time complexity, as demonstrated on synthetic and real data.

A well-known metric for quantifying the similarity between two clusterings is the adjusted mutual information. Compared to mutual information, a corrective term based on random permutations of the labels is introduced, preventing two clusterings being similar by chance. Unfortunately, this adjustment makes the metric computationally expensive. In this paper, we propose a novel adjustment based on {pairwise} label permutations instead of full label permutations. Specifically, we consider permutations where only two samples, selected uniformly at random, exchange their labels. We show that the corresponding adjusted metric, which can be expressed explicitly, behaves similarly to the standard adjusted mutual information for assessing the quality of a clustering, while having a much lower time complexity. Both metrics are compared in terms of quality and performance on experiments based on synthetic and real data.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes