A Markov Random Field model for Hypergraph-based Machine Learning
This work addresses the problem of improving machine learning on hypergraph data for researchers and practitioners, representing a novel method for a known bottleneck rather than an incremental advance.
The paper tackles the challenge of modeling data generation processes on hypergraphs by developing a hypergraph Markov random field that models joint distributions through a multivariate Gaussian with hypergraph-determined covariance, and introduces two frameworks: HGSI for structure inference and Hypergraph-MLP for node classification, which outperform existing methods on synthetic and real-world data and benchmarks.
Understanding the data-generating process is essential for building machine learning models that generalise well while ensuring robustness and interpretability. This paper addresses the fundamental challenge of modelling the data generation processes on hypergraphs and explores how such models can inform the design of machine learning algorithms for hypergraph data. The key to our approach is the development of a hypergraph Markov random field that models the joint distribution of the node features and hyperedge features in a hypergraph through a multivariate Gaussian distribution whose covariance matrix is uniquely determined by the hypergraph structure. The proposed data-generating process provides a valuable inductive bias for various hypergraph machine learning tasks, thus enhancing the algorithm design. In this paper, we focus on two representative downstream tasks: structure inference and node classification. Accordingly, we introduce two novel frameworks: 1) an original hypergraph structure inference framework named HGSI, and 2) a novel learning framework entitled Hypergraph-MLP for node classification on hypergraphs. Empirical evaluation of the proposed frameworks demonstrates that: 1) HGSI outperforms existing hypergraph structure inference methods on both synthetic and real-world data; and 2) Hypergraph-MLP outperforms baselines in six hypergraph node classification benchmarks, at the same time promoting runtime efficiency and robustness against structural perturbations during inference.