Is HC-SMoE superseded?

HC-SMoE (Mixture-of-experts routing): superseded — cited as a baseline and beaten by newer methods. 3 paper(s) critique it, 4 beat it on benchmarks — #8 of 1370 most-superseded. Sub-problem: cluster led by MC-SMoE. Newer alternatives in the same sub-problem include Less is MoE, TIDE, CoX-MoE, HodgeCover, dynamic expert replication strategy.

Method Drift›Mixture-of-experts routing

Superseded baseline#8 of 1,370 most-superseded

HC-SMoE

Mixture-of-experts routing

superseded — cited as a baseline and beaten by newer methods

3 papers critique it · 4 beat it on benchmarks

What papers say

Verbatim critique sentences, each from a paper that cites HC-SMoE as a baseline.

“simplistic aggregation functions that cannot effectively reconcile these divergent parameter spaces and often require computationally expensive post-merging operations”
— Sub-MoE: Efficient Mixture-of-Expert LLMs Compression via Subspace Expert Merging
“it can suffer a substantial performance drop on open-ended generation tasks.”
— EvoESAP: Non-Uniform Expert Pruning for Sparse MoE
“This idealized assumption often limits performance.”
— CAMERA: Multi-Matrix Joint Compression for MoE Models via Micro-Expert Redundancy Analysis

Beaten on benchmarks

Head-to-head results where a newer method reports beating HC-SMoE. Values are copied from the source paper's tables — verify against the cited paper.

LightMoE beats HC-SMoE · Average [30% compression]
55.3 vs 46.8
LightMoE: Reducing Mixture-of-Experts Redundancy through Expert Replacing
LightMoE beats HC-SMoE · Average [40% compression]
53.0 vs 38.4
LightMoE: Reducing Mixture-of-Experts Redundancy through Expert Replacing
LightMoE beats HC-SMoE · Average [50% compression]
48.1 vs 34.0
LightMoE: Reducing Mixture-of-Experts Redundancy through Expert Replacing
Sub-MoE (Ours) beats HC-SMoE · Average [Mixtral-8x7B Num=6]
0.64 vs 0.61
Sub-MoE: Efficient Mixture-of-Expert LLMs Compression via Subspace Expert Merging
Sub-MoE (Ours) beats HC-SMoE · Average [Mixtral-8x7B Num=4]
0.58 vs 0.51
Sub-MoE: Efficient Mixture-of-Expert LLMs Compression via Subspace Expert Merging
Sub-MoE (Ours) beats HC-SMoE · Average [Qwen1.5-MoE-A2.7B-Chat Num=45]
0.58 vs 0.55
Sub-MoE: Efficient Mixture-of-Expert LLMs Compression via Subspace Expert Merging
Sub-MoE (Ours) beats HC-SMoE · Average [Qwen1.5-MoE-A2.7B-Chat Num=30]
0.46 vs 0.42
Sub-MoE: Efficient Mixture-of-Expert LLMs Compression via Subspace Expert Merging
Sub-MoE (Ours) beats HC-SMoE · Average [Qwen3-30B-A3B Num=96]
0.60 vs 0.54
Sub-MoE: Efficient Mixture-of-Expert LLMs Compression via Subspace Expert Merging
Sub-MoE (Ours) beats HC-SMoE · Average [Qwen3-30B-A3B Num=64]
0.57 vs 0.38
Sub-MoE: Efficient Mixture-of-Expert LLMs Compression via Subspace Expert Merging
Sub-MoE (Ours) beats HC-SMoE · Average [DeepSeek-MoE-16B Num=48]
0.55 vs 0.53
Sub-MoE: Efficient Mixture-of-Expert LLMs Compression via Subspace Expert Merging
Sub-MoE (Ours) beats HC-SMoE · Average [DeepSeek-MoE-16B Num=32]
0.49 vs 0.46
Sub-MoE: Efficient Mixture-of-Expert LLMs Compression via Subspace Expert Merging
REAP beats HC-SMoE · Code Avg [ERNIE-4.5-21B-A3B-PT, 25% compression]
0.512 vs 0.479
REAP the Experts: Why Pruning Prevails for One-Shot MoE compression

Newer alternatives

Recent methods in the same sub-problem, not yet superseded in the knowledge base.