Method Drift›Mixture-of-experts routing
Expert Choice
Mixture-of-experts routing
superseded — cited as a baseline and beaten by newer methods
3 papers critique it · 2 beat it on benchmarks
What papers say
Verbatim critique sentences, each from a paper that cites Expert Choice as a baseline.
“Previous strategies like expert-choice anticipated this, but their routing design limit the assignment flexibility to image spatial regions without considering temporal denoising timestep complexity.”
— Expert Race: A Flexible Routing Strategy for Scaling Diffusion Transformer with Mixture of Experts“Yet, an unacceptable drawback is that it is not suited to casual language modeling due to the reliance on future tokens for the top-k token selection”
— AdaMoE: Token-Adaptive Routing with Null Experts for Mixture-of-Experts Language Models“The Expert Choice approach suffers from token dropping”
— Unified Sparse Mixture of Experts
Beaten on benchmarks
Head-to-head results where a newer method reports beating Expert Choice. Values are copied from the source paper's tables — verify against the cited paper.
- MoE-GRPO: Optimizing Mixture-of-Experts via Reinforcement Learning in Vision-Language Models
MoE-GRPO w/ LB beats Expert Choice · Avg. [routing methods]
55.7 vs 54.7
- Expert Race: A Flexible Routing Strategy for Scaling Diffusion Transformer with Mixture of Experts
Expert Race beats Expert Choice · FID [sigmoid gating]
13.85 vs 15.73
- Expert Race: A Flexible Routing Strategy for Scaling Diffusion Transformer with Mixture of Experts
Expert Race beats Expert Choice · CLIP [sigmoid gating]
22.23 vs 22.06
- Expert Race: A Flexible Routing Strategy for Scaling Diffusion Transformer with Mixture of Experts
Expert Race beats Expert Choice · FID [identity gating]
13.66 vs 15.70
- Expert Race: A Flexible Routing Strategy for Scaling Diffusion Transformer with Mixture of Experts
Expert Race beats Expert Choice · CLIP [identity gating]
22.25 vs 22.04
- Expert Race: A Flexible Routing Strategy for Scaling Diffusion Transformer with Mixture of Experts
Expert Race beats Expert Choice · FID [overall]
8.03 vs 10.13
- Expert Race: A Flexible Routing Strategy for Scaling Diffusion Transformer with Mixture of Experts
Expert Race beats Expert Choice · CLIP [overall]
23.09 vs 22.73
Newer alternatives
Recent methods in the same sub-problem, not yet superseded in the knowledge base.