Method Drift›Mixture-of-experts routing
NAEE
Mixture-of-experts routing
heavily superseded — a standard baseline that newer methods routinely beat
5 papers critique it · 6 beat it on benchmarks
What papers say
Verbatim critique sentences, each from a paper that cites NAEE as a baseline.
“Our proposed LExI method shares the overarching goal of exploiting expert redundancy to improve efficiency. However, rather than relying on static pruning, it focuses on adaptive expert utilization at inference time, offering a flexible and data-free alternative that preserves task performance while reducing computational cost.”
— LExI: Layer-Adaptive Active Experts for Efficient MoE Model Inference“Furthermore, search-based methods like NAEE are ill-suited for fine-grained MoE architectures due to combinatorial explosion.”
— LightMoE: Reducing Mixture-of-Experts Redundancy through Expert Replacing“Our Sub-MoE explores the merging paradigm that requires neither searching nor fine-tuning.”
— Sub-MoE: Efficient Mixture-of-Expert LLMs Compression via Subspace Expert Merging“but this approach is limited to models with Top-2 routing and does not generalize to larger expert sets.”
— Alloc-MoE: Budget-Aware Expert Activation Allocation for Efficient Mixture-of-Experts Inference“While effective on Mixtral-MoE, this brute-force approach does not scale well to modern sparse MoEs.”
— CAMERA: Multi-Matrix Joint Compression for MoE Models via Micro-Expert Redundancy Analysis
Beaten on benchmarks
Head-to-head results where a newer method reports beating NAEE. Values are copied from the source paper's tables — verify against the cited paper.
- Delta Decompression for MoE-based LLMs Compression
D²-MoE (Ours) beats NAEE · Average [Mixtral-8×7B, 20% compression]
0.60 vs 0.58
- Delta Decompression for MoE-based LLMs Compression
D²-MoE (Ours) beats NAEE · Average [Mixtral-8×7B, 60% compression]
0.52 vs 0.36
- Delta Decompression for MoE-based LLMs Compression
D²-MoE (Ours) beats NAEE · Average [DeepSeekMoE-16B-Base, 20% compression]
0.54 vs 0.53
- Delta Decompression for MoE-based LLMs Compression
D²-MoE (Ours) beats NAEE · Average [DeepSeekMoE-16B-Base, 60% compression]
0.41 vs 0.38
- Delta Decompression for MoE-based LLMs Compression
D²-MoE (Ours) beats NAEE · Average [Phi-3.5-MoE, 40% compression]
0.60 vs 0.57
- Delta Decompression for MoE-based LLMs Compression
D²-MoE (Ours) beats NAEE · Average [Qwen2-57B-A14B, 40% compression]
0.58 vs 0.55
- Delta Decompression for MoE-based LLMs Compression
D²-MoE beats NAEE · Tokens/sec [Mixtral-8x7B, 60% compression, BSZ=64]
277.72 vs 271.89
- Delta Decompression for MoE-based LLMs Compression
D²-MoE beats NAEE · Tokens/sec [Mixtral-8x7B, 80% compression, BSZ=64]
313.29 vs 278.53
- Efficient Expert Pruning for Sparse Mixture-of-Experts Language Models: Enhancing Performance and Reducing Inference Costs
EEP (Prune Only) beats NAEE · Avg. [Num=4]
70.3 vs 60.5
- Efficient Expert Pruning for Sparse Mixture-of-Experts Language Models: Enhancing Performance and Reducing Inference Costs
EEP (Prune+Merge) beats NAEE · Avg. [Num=4]
74.2 vs 60.5
- Efficient Expert Pruning for Sparse Mixture-of-Experts Language Models: Enhancing Performance and Reducing Inference Costs
EEP (Prune Only) beats NAEE · Avg. [Num=2]
59.7 vs 47.0
- Efficient Expert Pruning for Sparse Mixture-of-Experts Language Models: Enhancing Performance and Reducing Inference Costs
EEP (Prune+Merge) beats NAEE · Avg. [Num=2]
65.6 vs 47.0
Newer alternatives
Recent methods in the same sub-problem, not yet superseded in the knowledge base.
- Jun 4, 2026
- May 19, 2026
- CoX-MoECoX-MoE: Coalesced Expert Execution for High-Throughput MoE Inference with AMX-Enabled CPU-GPU Co-ExecutionMay 18, 2026
- HodgeCoverHodgeCover: Higher-Order Topological Coverage Drives Compression of Sparse Mixture-of-ExpertsMay 13, 2026
- dynamic expert replication strategyFast MoE Inference via Predictive Prefetching and Expert ReplicationMay 12, 2026
- Apr 22, 2026
- Apr 12, 2026
- Alloc-MoEAlloc-MoE: Budget-Aware Expert Activation Allocation for Efficient Mixture-of-Experts InferenceApr 9, 2026
- Mar 19, 2026
- Mar 13, 2026
- Mar 12, 2026
- Mar 6, 2026