NAEE (Mixture-of-experts routing): heavily superseded — a standard baseline that newer methods routinely beat. 5 paper(s) critique it, 6 beat it on benchmarks — #3 of 1370 most-superseded. Sub-problem: cluster led by MC-SMoE. Newer alternatives in the same sub-problem include Less is MoE, TIDE, CoX-MoE, HodgeCover, dynamic expert replication strategy.

Heavily superseded#3 of 1,370 most-superseded

NAEE

Mixture-of-experts routing

heavily superseded — a standard baseline that newer methods routinely beat

5 papers critique it · 6 beat it on benchmarks

What papers say

Verbatim critique sentences, each from a paper that cites NAEE as a baseline.

“Our proposed LExI method shares the overarching goal of exploiting expert redundancy to improve efficiency. However, rather than relying on static pruning, it focuses on adaptive expert utilization at inference time, offering a flexible and data-free alternative that preserves task performance while reducing computational cost.”
— LExI: Layer-Adaptive Active Experts for Efficient MoE Model Inference
“Furthermore, search-based methods like NAEE are ill-suited for fine-grained MoE architectures due to combinatorial explosion.”
— LightMoE: Reducing Mixture-of-Experts Redundancy through Expert Replacing
“Our Sub-MoE explores the merging paradigm that requires neither searching nor fine-tuning.”
— Sub-MoE: Efficient Mixture-of-Expert LLMs Compression via Subspace Expert Merging
“but this approach is limited to models with Top-2 routing and does not generalize to larger expert sets.”
— Alloc-MoE: Budget-Aware Expert Activation Allocation for Efficient Mixture-of-Experts Inference
“While effective on Mixtral-MoE, this brute-force approach does not scale well to modern sparse MoEs.”
— CAMERA: Multi-Matrix Joint Compression for MoE Models via Micro-Expert Redundancy Analysis

Beaten on benchmarks

Head-to-head results where a newer method reports beating NAEE. Values are copied from the source paper's tables — verify against the cited paper.

D²-MoE (Ours) beats NAEE · Average [Mixtral-8×7B, 20% compression]
0.60 vs 0.58
Delta Decompression for MoE-based LLMs Compression
D²-MoE (Ours) beats NAEE · Average [Mixtral-8×7B, 60% compression]
0.52 vs 0.36
Delta Decompression for MoE-based LLMs Compression
D²-MoE (Ours) beats NAEE · Average [DeepSeekMoE-16B-Base, 20% compression]
0.54 vs 0.53
Delta Decompression for MoE-based LLMs Compression
D²-MoE (Ours) beats NAEE · Average [DeepSeekMoE-16B-Base, 60% compression]
0.41 vs 0.38
Delta Decompression for MoE-based LLMs Compression
D²-MoE (Ours) beats NAEE · Average [Phi-3.5-MoE, 40% compression]
0.60 vs 0.57
Delta Decompression for MoE-based LLMs Compression
D²-MoE (Ours) beats NAEE · Average [Qwen2-57B-A14B, 40% compression]
0.58 vs 0.55
Delta Decompression for MoE-based LLMs Compression
D²-MoE beats NAEE · Tokens/sec [Mixtral-8x7B, 60% compression, BSZ=64]
277.72 vs 271.89
Delta Decompression for MoE-based LLMs Compression
D²-MoE beats NAEE · Tokens/sec [Mixtral-8x7B, 80% compression, BSZ=64]
313.29 vs 278.53
Delta Decompression for MoE-based LLMs Compression
EEP (Prune Only) beats NAEE · Avg. [Num=4]
70.3 vs 60.5
Efficient Expert Pruning for Sparse Mixture-of-Experts Language Models: Enhancing Performance and Reducing Inference Costs
EEP (Prune+Merge) beats NAEE · Avg. [Num=4]
74.2 vs 60.5
Efficient Expert Pruning for Sparse Mixture-of-Experts Language Models: Enhancing Performance and Reducing Inference Costs
EEP (Prune Only) beats NAEE · Avg. [Num=2]
59.7 vs 47.0
Efficient Expert Pruning for Sparse Mixture-of-Experts Language Models: Enhancing Performance and Reducing Inference Costs
EEP (Prune+Merge) beats NAEE · Avg. [Num=2]
65.6 vs 47.0
Efficient Expert Pruning for Sparse Mixture-of-Experts Language Models: Enhancing Performance and Reducing Inference Costs

Newer alternatives

Recent methods in the same sub-problem, not yet superseded in the knowledge base.