Graph Fourier MMD for Signals on Graphs
This addresses the need for distribution comparison on graphs in biomedical sciences, offering a novel method for tasks like gene selection, though it appears incremental as it builds on existing MMD concepts for graphs.
The authors tackled the problem of comparing distributions and signals on graphs, which is important for biomedical data like protein networks and single-cell RNA-sequencing, by proposing Graph Fourier MMD (GFMMD), a novel distance method that finds an analytical solution and is applied to benchmark datasets and gene clustering with meaningful results.
While numerous methods have been proposed for computing distances between probability distributions in Euclidean space, relatively little attention has been given to computing such distances for distributions on graphs. However, there has been a marked increase in data that either lies on graph (such as protein interaction networks) or can be modeled as a graph (single cell data), particularly in the biomedical sciences. Thus, it becomes important to find ways to compare signals defined on such graphs. Here, we propose Graph Fourier MMD (GFMMD), a novel distance between distributions and signals on graphs. GFMMD is defined via an optimal witness function that is both smooth on the graph and maximizes difference in expectation between the pair of distributions on the graph. We find an analytical solution to this optimization problem as well as an embedding of distributions that results from this method. We also prove several properties of this method including scale invariance and applicability to disconnected graphs. We showcase it on graph benchmark datasets as well on single cell RNA-sequencing data analysis. In the latter, we use the GFMMD-based gene embeddings to find meaningful gene clusters. We also propose a novel type of score for gene selection called "gene localization score" which helps select genes for cellular state space characterization.