CLAIMay 10, 2025

xGen-small Technical Report

arXiv:2505.06496v11 citationsh-index: 27
Originality Synthesis-oriented
AI Analysis

This work addresses the need for optimized models for long-context tasks, but it appears incremental as it builds on existing Transformer and training methodologies.

The paper tackled the challenge of developing efficient Transformer decoder models for long-context applications, resulting in xGen-small models that achieve strong performance in math and coding domains and excel at long-context benchmarks.

We introduce xGen-small, a family of 4B and 9B Transformer decoder models optimized for long-context applications. Our vertically integrated pipeline unites domain-balanced, frequency-aware data curation; multi-stage pre-training with quality annealing and length extension to 128k tokens; and targeted post-training via supervised fine-tuning, preference learning, and online reinforcement learning. xGen-small delivers strong performance across various tasks, especially in math and coding domains, while excelling at long context benchmarks.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes