Method Drift›Mixture-of-experts routing
CLIP-MoE
CLIP-MoE: Towards Building Mixture of Experts for CLIP with Diversified Multiplet UpcyclingMixture-of-experts routing · first seen Sep 28, 2024
superseded — cited as a baseline and beaten by newer methods
1 papers critique it · 2 beat it on benchmarks
What papers say
Verbatim critique sentences, each from a paper that cites CLIP-MoE as a baseline.
“However, scaling this method is challenging, as adding more experts increases complexity and necessitates additional training stages.”
— CLIP-UP: A Simple and Efficient Mixture-of-Experts CLIP Training Recipe with Sparse Upcycling
Beaten on benchmarks
Head-to-head results where a newer method reports beating CLIP-MoE. Values are copied from the source paper's tables — verify against the cited paper.
- ERMoE: Eigen-Reparameterized Mixture-of-Experts for Stable Routing and Interpretable Specialization
CLIP-ERMoE beats CLIP-MoE · Recall@1 [COCO I2T Recall@1]
65.4 vs 65.0
- ERMoE: Eigen-Reparameterized Mixture-of-Experts for Stable Routing and Interpretable Specialization
CLIP-ERMoE beats CLIP-MoE · Recall@5 [COCO I2T Recall@5]
88.3 vs 86.0
- ERMoE: Eigen-Reparameterized Mixture-of-Experts for Stable Routing and Interpretable Specialization
CLIP-ERMoE beats CLIP-MoE · Recall@10 [COCO I2T Recall@10]
94.7 vs 92.0
- Enhancing Mixture-of-Experts Specialization via Cluster-Aware Upcycling
Upcycling beats CLIP-MoE · Avg. [ViT-B/32]
39.6 vs 38.2
- Enhancing Mixture-of-Experts Specialization via Cluster-Aware Upcycling
Upcycling beats CLIP-MoE · Avg. [ViT-B/16]
43.5 vs 42.8
- Enhancing Mixture-of-Experts Specialization via Cluster-Aware Upcycling
Upcycling beats CLIP-MoE · 5-shot [ViT-B/16 5-shot]
51.5 vs 51.3
- Enhancing Mixture-of-Experts Specialization via Cluster-Aware Upcycling
Upcycling beats CLIP-MoE · 10-shot [ViT-B/16 10-shot]
58.2 vs 58.0
- Enhancing Mixture-of-Experts Specialization via Cluster-Aware Upcycling
Upcycling beats CLIP-MoE · FT [ViT-B/16 FT]
73.3 vs 73.2
Newer alternatives
Recent methods in the same sub-problem, not yet superseded in the knowledge base.
- MetaMoEMetaMoE: Diversity-Aware Proxy Selection for Privacy-Preserving Mixture-of-Experts UnificationMay 14, 2026
- Apr 20, 2026
- BERT-MoE FrameworkAspect-Based Sentiment Analysis for Future Tourism Experiences: A BERT-MoE Framework for Persian User ReviewsFeb 13, 2026
- null experts within token-choice MoEImproving MoE Compute Efficiency by Composing Weight and Data SparsityJan 21, 2026
- MixtureKitMixtureKit: A General Framework for Composing, Training, and Visualizing Mixture-of-Experts ModelsDec 13, 2025
- ERMoEERMoE: Eigen-Reparameterized Mixture-of-Experts for Stable Routing and Interpretable SpecializationNov 14, 2025
- Dirichlet-Prior Shaping Loss (DPSL)Dirichlet-Prior Shaping: Guiding Expert Specialization in Upcycled MoEsOct 1, 2025
- Symphony-MoESymphony-MoE: Harmonizing Disparate Pre-trained Models into a Coherent Mixture-of-ExpertsSep 23, 2025