AIJul 8, 2025

BlueLM-2.5-3B Technical Report

Baojiao Xiong, Boheng Chen, Chengzhi Wang, Daxiong Luo, Dongsheng Xu, Dongyang Liu, Fan Yang, Fangyuan Li, Fei Teng, Feng Wang, Fukang Qin, Fuquan Peng

BaiduTencent

arXiv:2507.05934v19.63 citationsh-index: 7

Originality Incremental advance

AI Analysis

This work addresses the need for high-performance, on-device multimodal AI models, though it appears incremental as it builds on existing MLLM paradigms with specific efficiency improvements.

The paper tackles the problem of developing a compact multimodal large language model for efficient edge-device deployment, achieving comparable performance to larger models like Qwen3-4B on text-only benchmarks and trailing Kimi-VL-A3B-16B by only about 5% on average in multimodal evaluations with only 2.9 billion parameters.

We present BlueLM-2.5-3B, a compact and unified dense Multimodal Large Language Model (MLLM) designed for efficient edge-device deployment, offering strong general-purpose and reasoning capabilities. To the best of our knowledge, this is the first 3B-scale MLLM to support both thinking and non-thinking modes, while also enabling explicit control over thinking token budget. BlueLM-2.5-3B is developed through diversified data curation, key data resampling, hybrid heterogeneous reinforcement learning, and a high-performance training infrastructure. Our model achieves superior multimodal capacity while preserving competitive pure-text performance with only 2.9 billion parameters. We conduct comprehensive evaluations across a broad range of multimodal and text-only benchmarks. In thinking mode, BlueLM-2.5-3B achieves comparable performance to Qwen3-4B on text-only benchmarks, and trails the larger Kimi-VL-A3B-16B by only about 5% on average across multimodal evaluations. In non-thinking mode, it outperforms Qwen2.5-VL-3B on the majority of multimodal benchmarks. Additionally, BlueLM-2.5-3B exhibits exceptional data efficiency. All of the aforementioned performance is achieved with substantially less total training data than Qwen2.5-VL-3B and Qwen3-4B. We hope our work contributes to the advancement of high-performance, on-device MLLMs and provides meaningful insights to the research community.

View on arXiv PDF

Similar