MEMLAug 15, 2019

Pearson Distance is not a Distance

arXiv:1908.06029v111 citations
AI Analysis

This addresses a foundational issue for researchers in fields like gene expression and brain imaging who rely on metric assumptions for clustering.

The paper demonstrates that the widely used Pearson distance (1-ρ) is not a metric, violating the triangle inequality, and shows that √(1-ρ) is a valid metric, with similar results for absolute correlation measures.

The Pearson distance between a pair of random variables $X,Y$ with correlation $ρ_{xy}$, namely, 1-$ρ_{xy}$, has gained widespread use, particularly for clustering, in areas such as gene expression analysis, brain imaging and cyber security. In all these applications it is implicitly assumed/required that the distance measures be metrics, thus satisfying the triangle inequality. We show however, that Pearson distance is not a metric. We go on to show that this can be repaired by recalling the result, (well known in other literature) that $\sqrt{1-ρ_{xy}}$ is a metric. We similarly show that a related measure of interest, $1-|ρ_{xy}|$, which is invariant to the sign of $ρ_{xy}$, is not a metric but that $\sqrt{1-ρ_{xy}^2}$ is. We also give generalizations of these results.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes