CLJan 13

Ministral 3

arXiv:2601.08584v169 citationsh-index: 27
Originality Synthesis-oriented
AI Analysis

This provides more accessible language models for resource-limited users, though it appears incremental as it builds on existing parameter-efficient and distillation techniques.

The authors tackled the problem of developing parameter-efficient dense language models for compute and memory constrained applications by introducing the Ministral 3 series in three sizes (3B, 8B, 14B parameters) with variants for general use, instruction finetuning, and reasoning, created using Cascade Distillation.

We introduce the Ministral 3 series, a family of parameter-efficient dense language models designed for compute and memory constrained applications, available in three model sizes: 3B, 8B, and 14B parameters. For each model size, we release three variants: a pretrained base model for general-purpose use, an instruction finetuned, and a reasoning model for complex problem-solving. In addition, we present our recipe to derive the Ministral 3 models through Cascade Distillation, an iterative pruning and continued training with distillation technique. Each model comes with image understanding capabilities, all under the Apache 2.0 license.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes