LGAICLMay 13

HodgeCover: Higher-Order Topological Coverage Drives Compression of Sparse Mixture-of-Experts

arXiv:2605.1399783.0
AI Analysis

For practitioners deploying large MoE models, this work addresses a critical blind spot in compression algorithms, enabling more effective inference cost reduction without retraining.

The paper identifies a fundamental obstruction in learning-free compression of Sparse Mixture-of-Experts (MoE) layers: pairwise compatibility of experts does not guarantee triple-wise mergeability, which existing compressors ignore. The authors propose HodgeCover, a method that uses Hodge decomposition to detect and cover irreducible cycles, achieving state-of-the-art performance on aggressive compression regimes.

Sparse Mixture-of-Experts (MoE) layers route tokens through a handful of experts, and learning-free compression of these layers reduces inference cost without retraining. A subtle obstruction blocks every existing compressor in this family: three experts can each be pairwise compatible yet form an irreducible cycle when merged together, so any score that ranks experts on pairwise signals is structurally blind to which triples are jointly mergeable. We show the obstruction is a precise mathematical object, the harmonic kernel of the simplicial Laplacian on a 2-complex whose vertices are experts, whose edges carry KL merge barriers, and whose faces carry triplet barriers; Hodge-decomposing the edge-barrier signal isolates the kernel exactly. We turn the diagnostic into a selection objective: HodgeCover greedily covers the harmonic-critical edges and triplet-critical triangles, and a hybrid variant of HodgeCover pairs it with off-the-shelf weight pruning on survivors. On three open-weight Sparse MoE backbones under aggressive expert reduction, HodgeCover matches state-of-the-art learning-free baselines on the expert-reduction axis, leads on the aggressive-compression frontier of the hybrid axis, and uniquely balances retained mass across all four Hodge components. These results show that exposing the harmonic kernel of a learned MoE structure changes which compressor wins at the regime that matters most.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes