CVAILGJan 20, 2025

Eagle 2: Building Post-Training Data Strategies from Scratch for Frontier Vision-Language Models

arXiv:2501.14818v160 citationsh-index: 58Has Code
Originality Incremental advance
AI Analysis

This work addresses the problem of limited transparency in data strategies for the open-source VLM community, offering incremental improvements through detailed insights and recipes.

The paper tackles the opacity in data strategies for open-source vision-language models by developing a post-training data strategy from scratch, resulting in Eagle2-9B achieving state-of-the-art results on multimodal benchmarks and matching models with up to 70B parameters.

Recently, promising progress has been made by open-source vision-language models (VLMs) in bringing their capabilities closer to those of proprietary frontier models. However, most open-source models only publish their final model weights, leaving the critical details of data strategies and implementation largely opaque. In this work, we address VLM post-training from a data-centric perspective, showing the key role of data strategy in developing frontier VLMs. By studying and building our post-training data strategy from scratch, we share detailed insights into the development processes, aiming to benefit the development of competitive models for the open-source community. Our introduced data strategy, together with training recipes and model design, leads to a family of performant VLMs named Eagle2. Specifically, Eagle2-9B achieves state-of-the-art results across various multimodal benchmarks, matching certain competitive models with up to 70B parameters.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes