CLJul 23, 2025

Megrez2 Technical Report

arXiv:2507.17728v11 citations
Originality Highly original
AI Analysis

This addresses the need for efficient, deployable AI models for real-world applications, representing a novel method for a known bottleneck rather than a foundational breakthrough.

The paper tackles the problem of deploying large language models on resource-constrained devices by introducing Megrez2, a lightweight architecture with cross-layer expert sharing and pre-gated routing, resulting in a 3B activated parameter model that achieves competitive or superior performance to larger models on tasks like language understanding and code generation.

We present Megrez2, a novel lightweight and high-performance language model architecture optimized for device native deployment. Megrez2 introduces a novel cross-layer expert sharing mechanism, which significantly reduces total parameter count by reusing expert modules across adjacent transformer layers while maintaining most of the model's capacity. It also incorporates pre-gated routing, enabling memory-efficient expert loading and faster inference. As the first instantiation of the Megrez2 architecture, we introduce the Megrez2-Preview model, which is pre-trained on a 5-trillion-token corpus and further enhanced through supervised fine-tuning and reinforcement learning with verifiable rewards. With only 3B activated and 7.5B stored parameters, Megrez2-Preview demonstrates competitive or superior performance compared to larger models on a wide range of tasks, including language understanding, instruction following, mathematical reasoning, and code generation. These results highlight the effectiveness of the Megrez2 architecture to achieve a balance between accuracy, efficiency, and deployability, making it a strong candidate for real-world, resource-constrained applications.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes