Method Drift›Mixture-of-experts routing
Superseded baseline#200 of 1,370 most-superseded
auxiliary losses
Mixture-of-experts routing
superseded — cited as a baseline and beaten by newer methods
2 papers critique it · 0 beat it on benchmarks
What papers say
Verbatim critique sentences, each from a paper that cites auxiliary losses as a baseline.
“multiple studies dai2022stablemoe,wu2024gw,wang2024auxiliary demonstrate that auxiliary losses can significantly impair training stability and model performance”
— Input Domain Aware MoE: Decoupling Routing Decisions from Task Optimization in Mixture of Experts“The reliance on auxiliary losses requires careful balancing between the router loss and the task loss, which introduces trade-offs”
— Unified Sparse Mixture of Experts
Newer alternatives
Recent methods in the same sub-problem, not yet superseded in the knowledge base.