Method Drift›Mixture-of-experts routing
SEER-MoE
SEER-MoE: Sparse Expert Efficiency through Regularization for Mixture-of-ExpertsMixture-of-experts routing · first seen Apr 7, 2024
superseded — cited as a baseline and beaten by newer methods
3 papers critique it · 3 beat it on benchmarks
What papers say
Verbatim critique sentences, each from a paper that cites SEER-MoE as a baseline.
“SEER-MoE removes experts based on activation frequency and fine-tunes with entropy-based regularization.”
— Less is MoE: Trimming Experts in Domain-Specialist Language Models“Our Sub-MoE explores the merging paradigm that requires neither searching nor fine-tuning.”
— Sub-MoE: Efficient Mixture-of-Expert LLMs Compression via Subspace Expert Merging“expert pruning metrics based on gate statistics collected during decoding. Although these methods actively deal with expert pruning for MoE models, they are still limited to the machine translation domain with linguistic models.”
— Cluster-Driven Expert Pruning for Mixture-of-Experts Large Language Models
Beaten on benchmarks
Head-to-head results where a newer method reports beating SEER-MoE. Values are copied from the source paper's tables — verify against the cited paper.
- HFedMoE: Resource-aware Heterogeneous Federated Learning with Mixture-of-Experts
HFedMoE beats SEER-MoE · test accuracy [Switch-base-64 on AGNews, 18 GB]
0.915 vs 0.892
- HFedMoE: Resource-aware Heterogeneous Federated Learning with Mixture-of-Experts
HFedMoE beats SEER-MoE · test accuracy [Switch-base-64 on AGNews, 19 GB]
0.933 vs 0.905
- HFedMoE: Resource-aware Heterogeneous Federated Learning with Mixture-of-Experts
HFedMoE beats SEER-MoE · test accuracy [Switch-base-64 on AGNews, 20 GB]
0.942 vs 0.919
- HFedMoE: Resource-aware Heterogeneous Federated Learning with Mixture-of-Experts
HFedMoE beats SEER-MoE · test accuracy [Switch-base-64 on AGNews, 21 GB]
0.945 vs 0.927
- HFedMoE: Resource-aware Heterogeneous Federated Learning with Mixture-of-Experts
HFedMoE beats SEER-MoE · test accuracy [Switch-base-64 on AGNews, 22 GB]
0.948 vs 0.936
- HFedMoE: Resource-aware Heterogeneous Federated Learning with Mixture-of-Experts
HFedMoE beats SEER-MoE · test accuracy [DeepSeek-MoE-16B on MMLU, 26 GB]
0.251 vs 0.250
- HFedMoE: Resource-aware Heterogeneous Federated Learning with Mixture-of-Experts
HFedMoE beats SEER-MoE · test accuracy [DeepSeek-MoE-16B on MMLU, 27.5 GB]
0.312 vs 0.291
- HFedMoE: Resource-aware Heterogeneous Federated Learning with Mixture-of-Experts
HFedMoE beats SEER-MoE · test accuracy [DeepSeek-MoE-16B on MMLU, 29 GB]
0.394 vs 0.352
- HFedMoE: Resource-aware Heterogeneous Federated Learning with Mixture-of-Experts
HFedMoE beats SEER-MoE · test accuracy [DeepSeek-MoE-16B on MMLU, 30.5 GB]
0.444 vs 0.371
- HFedMoE: Resource-aware Heterogeneous Federated Learning with Mixture-of-Experts
HFedMoE beats SEER-MoE · test accuracy [DeepSeek-MoE-16B on MMLU, 32 GB]
0.457 vs 0.396
- Unveiling Hidden Collaboration within Mixture-of-Experts in Large Language Models
CAEP (Ours) beats SEER-MoE · AVG [DeepSeek model with 25% experts pruned]
0.612 vs 0.5872
- STUN: Structured-Then-Unstructured Pruning for Scalable MoE Pruning
STUN beats SEER-MoE · Avg [Mixtral-8x7B 25% sparsity]
70.7 vs 56.7
Newer alternatives
Recent methods in the same sub-problem, not yet superseded in the knowledge base.
- Jun 4, 2026
- May 19, 2026
- CoX-MoECoX-MoE: Coalesced Expert Execution for High-Throughput MoE Inference with AMX-Enabled CPU-GPU Co-ExecutionMay 18, 2026
- HodgeCoverHodgeCover: Higher-Order Topological Coverage Drives Compression of Sparse Mixture-of-ExpertsMay 13, 2026
- dynamic expert replication strategyFast MoE Inference via Predictive Prefetching and Expert ReplicationMay 12, 2026
- Apr 22, 2026
- Apr 12, 2026
- Alloc-MoEAlloc-MoE: Budget-Aware Expert Activation Allocation for Efficient Mixture-of-Experts InferenceApr 9, 2026
- Mar 19, 2026
- Mar 13, 2026
- Mar 12, 2026
- Mar 6, 2026