CL AIMay 10, 2025

xGen-small Technical Report

Erik Nijkamp, Bo Pang, Egor Pakhomov, Akash Gokul, Jin Qu, Silvio Savarese, Yingbo Zhou, Caiming Xiong

arXiv:2505.06496v14.91 citationsh-index: 27

Originality Synthesis-oriented

AI Analysis

This work addresses the need for optimized models for long-context tasks, but it appears incremental as it builds on existing Transformer and training methodologies.

The paper tackled the challenge of developing efficient Transformer decoder models for long-context applications, resulting in xGen-small models that achieve strong performance in math and coding domains and excel at long-context benchmarks.

We introduce xGen-small, a family of 4B and 9B Transformer decoder models optimized for long-context applications. Our vertically integrated pipeline unites domain-balanced, frequency-aware data curation; multi-stage pre-training with quality annealing and length extension to 128k tokens; and targeted post-training via supervised fine-tuning, preference learning, and online reinforcement learning. xGen-small delivers strong performance across various tasks, especially in math and coding domains, while excelling at long context benchmarks.

View on arXiv PDF

Similar