LGAINESep 29, 2025

EOE: Evolutionary Optimization of Experts for Training Language Models

arXiv:2509.24436v1Has Code
Originality Incremental advance
AI Analysis

This work addresses the high computational cost of training large language models, making it more feasible for deployment on PCs and edge devices, though it is incremental as it builds on existing expert and evolutionary methods.

The paper tackles the problem of reducing memory usage and increasing throughput in large language model training by dividing the model into experts and using evolutionary operators to optimize them, achieving over ten times throughput acceleration while maintaining nearly the same accuracy as the full model.

This paper presents an evolutionary framework for the training of large language models(LLM). The models are divided into several experts(sub-networks), which have the same structure but different parameter values. Only one expert is trained at each step. After the classical AdamW optimization, some evolutionary operators(crossover, PSO, and mutation) act on the tensor weights between the current expert and the best expert. So current expert would learn the experience of best expert. The direction of best expert would help current expert's loss decrease faster. Finally, only save the weight of the best expert. Experiments show that best expert would achieve nearly the same accuracy as the full model. This would greatly reduce the size of the model for inference. Since only one expert is trained at each step, the training needs much less memory and has much higher throughput. Experiments show that the throughput would accelerate more than ten times! Our source code is available. It's a pure c++/cu framework, which is suitable for easy deployment on PCs and edge computing devices.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes