CV AI LGJan 20, 2025

Eagle 2: Building Post-Training Data Strategies from Scratch for Frontier Vision-Language Models

Zhiqi Li, Guo Chen, Shilong Liu, Shihao Wang, Vibashan VS, Yishen Ji, Shiyi Lan, Hao Zhang, Yilin Zhao, Subhashree Radhakrishnan, Nadine Chang, Karan Sapra

arXiv:2501.14818v135.371 citationsh-index: 58Has Code

Originality Incremental advance

AI Analysis

This work addresses the problem of limited transparency in data strategies for the open-source VLM community, offering incremental improvements through detailed insights and recipes.

The paper tackles the opacity in data strategies for open-source vision-language models by developing a post-training data strategy from scratch, resulting in Eagle2-9B achieving state-of-the-art results on multimodal benchmarks and matching models with up to 70B parameters.

Recently, promising progress has been made by open-source vision-language models (VLMs) in bringing their capabilities closer to those of proprietary frontier models. However, most open-source models only publish their final model weights, leaving the critical details of data strategies and implementation largely opaque. In this work, we address VLM post-training from a data-centric perspective, showing the key role of data strategy in developing frontier VLMs. By studying and building our post-training data strategy from scratch, we share detailed insights into the development processes, aiming to benefit the development of competitive models for the open-source community. Our introduced data strategy, together with training recipes and model design, leads to a family of performant VLMs named Eagle2. Specifically, Eagle2-9B achieves state-of-the-art results across various multimodal benchmarks, matching certain competitive models with up to 70B parameters.

View on arXiv PDF Code

Similar