Method Drift›Mixture-of-experts routing
LEMoE
LEMoE: Advanced Mixture of Experts Adaptor for Lifelong Model Editing of Large Language ModelsMixture-of-experts routing · first seen Jun 28, 2024
superseded — cited as a baseline and beaten by newer methods
3 papers critique it · 1 beat it on benchmarks
What papers say
Verbatim critique sentences, each from a paper that cites LEMoE as a baseline.
“However, LEMoE functions as a parameter-preserving framework that attaches external modules to a frozen backbone (typically dense). It addresses routing consistency within the added adaptor rather than the routing distribution shift of the base model itself.”
— MoEEdit: Efficient and Routing-Stable Knowledge Editing for Mixture-of-Experts LLMs“Additionally, since the routers in these methods are fine-tuned, even if previous experts are frozen, there can still be modifications to past knowledge.”
— LLaVA-CMoE: Towards Continual Mixture of Experts for Large Vision-Language Models“LEMoE is based on MoE, but its greedy routing harms old experts' influence when integrating new ones.”
— Lifelong Knowledge Editing for Vision Language Models with Low-Rank Mixture-of-Experts
Beaten on benchmarks
Head-to-head results where a newer method reports beating LEMoE. Values are copied from the source paper's tables — verify against the cited paper.
- Lifelong Knowledge Editing for Vision Language Models with Low-Rank Mixture-of-Experts
LiveEdit beats LEMoE · Average [LLaVA (7B), 1 edit, E-VQA]
95.36 vs 94.52
- Lifelong Knowledge Editing for Vision Language Models with Low-Rank Mixture-of-Experts
LiveEdit beats LEMoE · Average [LLaVA (7B), 1 edit, VLKEB]
97.08 vs 90.90
- Lifelong Knowledge Editing for Vision Language Models with Low-Rank Mixture-of-Experts
LiveEdit beats LEMoE · Average [LLaVA (7B), 10 edits, E-VQA]
94.68 vs 78.66
- Lifelong Knowledge Editing for Vision Language Models with Low-Rank Mixture-of-Experts
LiveEdit beats LEMoE · Average [LLaVA (7B), 10 edits, VLKEB]
96.26 vs 79.43
Newer alternatives
Recent methods in the same sub-problem, not yet superseded in the knowledge base.
- PARAMΔ Integration into Upcycled MoEA Data-Efficient Path to Multilingual LLMs: Language Expansion via Post-training PARAM$Δ$ Integration into Upcycled MoEMay 18, 2026
- MEMIT-like framework for MoEScalable Knowledge Editing for Mixture-of-Experts LLMs via Tensor-Structured UpdatesMay 15, 2026
- May 11, 2026
- May 8, 2026
- Apr 28, 2026
- CoGR-MoECoGR-MoE: Concept-Guided Expert Routing with Consistent Selection and Flexible Reasoning for Visual Question AnsweringApr 18, 2026
- Apr 2, 2026
- On Token's DilemmaOn Token's Dilemma: Dynamic MoE with Drift-Aware Token Assignment for Continual Learning of Large Vision Language ModelsMar 29, 2026
- Mixture-of-Experts (MoE) and Mixture-of-Linear-Experts (MoLE) architectures for MLIPsScaling Machine Learning Interatomic Potentials with Mixtures of ExpertsMar 9, 2026
- Mar 5, 2026
- Feb 13, 2026
- Multiscale Interaction Mixture of Experts (MI-MoE)Topology-Aware Multiscale Mixture of Experts for Efficient Molecular Property PredictionJan 19, 2026