One Router to Route Them All: Homogeneous Expert Routing for Heterogeneous Graph Transformers
This work addresses a design challenge in heterogeneous graph learning for researchers and practitioners, offering a new principle to improve generalization and efficiency, though it is incremental as it builds on existing MoE and HGT methods.
The paper tackled the problem of overreliance on node/edge type labels in heterogeneous graph neural networks, which impedes cross-type knowledge transfer, by proposing Homogeneous Expert Routing (HER) for Heterogeneous Graph Transformers, and it consistently outperformed standard HGT and a type-separated MoE baseline on link prediction tasks across IMDB, ACM, and DBLP datasets.
A common practice in heterogeneous graph neural networks (HGNNs) is to condition parameters on node/edge types, assuming types reflect semantic roles. However, this can cause overreliance on surface-level labels and impede cross-type knowledge transfer. We explore integrating Mixture-of-Experts (MoE) into HGNNs--a direction underexplored despite MoE's success in homogeneous settings. Crucially, we question the need for type-specific experts. We propose Homogeneous Expert Routing (HER), an MoE layer for Heterogeneous Graph Transformers (HGT) that stochastically masks type embeddings during routing to encourage type-agnostic specialization. Evaluated on IMDB, ACM, and DBLP for link prediction, HER consistently outperforms standard HGT and a type-separated MoE baseline. Analysis on IMDB shows HER experts specialize by semantic patterns (e.g., movie genres) rather than node types, confirming routing is driven by latent semantics. Our work demonstrates that regularizing type dependence in expert routing yields more generalizable, efficient, and interpretable representations--a new design principle for heterogeneous graph learning.