Revisiting Graph Homophily Measures
This work addresses a methodological gap for researchers and practitioners in graph machine learning by providing a more reliable homophily measure, but it is incremental as it builds on prior definitions and focuses on improving existing metrics rather than introducing a new paradigm.
The paper tackles the problem of existing graph homophily measures having drawbacks, such as unreliability across datasets with varying class numbers and sizes, by introducing a new unbiased homophily measure that satisfies all desirable properties and works reliably for undirected graphs. The result includes theoretical and empirical demonstrations showing the new measure's superior behavior compared to existing ones, though it notes that for directed graphs, no measure can satisfy all properties due to contradictions.
Homophily is a graph property describing the tendency of edges to connect similar nodes. There are several measures used for assessing homophily but all are known to have certain drawbacks: in particular, they cannot be reliably used for comparing datasets with varying numbers of classes and class size balance. To show this, previous works on graph homophily suggested several properties desirable for a good homophily measure, also noting that no existing homophily measure has all these properties. Our paper addresses this issue by introducing a new homophily measure - unbiased homophily - that has all the desirable properties and thus can be reliably used across datasets with different label distributions. The proposed measure is suitable for undirected (and possibly weighted) graphs. We show both theoretically and via empirical examples that the existing homophily measures have serious drawbacks while unbiased homophily has a desirable behavior for the considered scenarios. Finally, when it comes to directed graphs, we prove that some desirable properties contradict each other and thus a measure satisfying all of them cannot exist.