CLAIMar 25, 2025

Gemma 3 Technical Report

DeepMindMIT
arXiv:2503.19786v11446 citationsh-index: 102
Originality Incremental advance
AI Analysis

This work provides incremental improvements in lightweight open models for broader AI applications, enhancing capabilities like math, chat, and multilingual tasks.

The authors introduced Gemma 3, a multimodal lightweight open model with vision understanding, multilingual support, and long context up to 128K tokens, achieving superior performance to Gemma 2 and making Gemma3-27B-IT comparable to Gemini-1.5-Pro across benchmarks.

We introduce Gemma 3, a multimodal addition to the Gemma family of lightweight open models, ranging in scale from 1 to 27 billion parameters. This version introduces vision understanding abilities, a wider coverage of languages and longer context - at least 128K tokens. We also change the architecture of the model to reduce the KV-cache memory that tends to explode with long context. This is achieved by increasing the ratio of local to global attention layers, and keeping the span on local attention short. The Gemma 3 models are trained with distillation and achieve superior performance to Gemma 2 for both pre-trained and instruction finetuned versions. In particular, our novel post-training recipe significantly improves the math, chat, instruction-following and multilingual abilities, making Gemma3-4B-IT competitive with Gemma2-27B-IT and Gemma3-27B-IT comparable to Gemini-1.5-Pro across benchmarks. We release all our models to the community.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes