LGITMar 15, 2022

On Suspicious Coincidences and Pointwise Mutual Information

arXiv:2203.08089v39 citationsh-index: 2
AI Analysis

This work addresses the statistical analysis of co-occurrence events for researchers in fields using PMI for anomaly detection, but it is incremental as it primarily reviews and compares existing measures.

The paper examines measures of association for 2x2 contingency tables, focusing on how mutual information (MI) and pointwise mutual information (PMI) relate to Yule's Y after accounting for marginal probabilities, highlighting PMI's sensitivity to sparsity in event detection.

Barlow (1985) hypothesized that the co-occurrence of two events $A$ and $B$ is "suspicious" if $P(A,B) \gg P(A) P(B)$. We first review classical measures of association for $2 \times 2$ contingency tables, including Yule's $Y$ (Yule, 1912), which depends only on the odds ratio $λ$, and is independent of the marginal probabilities of the table. We then discuss the mutual information (MI) and pointwise mutual information (PMI), which depend on the ratio $P(A,B)/P(A)P(B)$, as measures of association. We show that, once the effect of the marginals is removed, MI and PMI behave similarly to $Y$ as functions of $λ$. The pointwise mutual information is used extensively in some research communities for flagging suspicious coincidences, but it is important to bear in mind the sensitivity of the PMI to the marginals, with increased scores for sparser events.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes