CLJan 13

Ministral 3

Alexander H. Liu, Kartik Khandelwal, Sandeep Subramanian, Victor Jouault, Abhinav Rastogi, Adrien Sadé, Alan Jeffares, Albert Jiang, Alexandre Cahill, Alexandre Gavaudan, Alexandre Sablayrolles, Amélie Héliou

arXiv:2601.08584v118.369 citationsh-index: 27

Originality Synthesis-oriented

AI Analysis

This provides more accessible language models for resource-limited users, though it appears incremental as it builds on existing parameter-efficient and distillation techniques.

The authors tackled the problem of developing parameter-efficient dense language models for compute and memory constrained applications by introducing the Ministral 3 series in three sizes (3B, 8B, 14B parameters) with variants for general use, instruction finetuning, and reasoning, created using Cascade Distillation.

We introduce the Ministral 3 series, a family of parameter-efficient dense language models designed for compute and memory constrained applications, available in three model sizes: 3B, 8B, and 14B parameters. For each model size, we release three variants: a pretrained base model for general-purpose use, an instruction finetuned, and a reasoning model for complex problem-solving. In addition, we present our recipe to derive the Ministral 3 models through Cascade Distillation, an iterative pruning and continued training with distillation technique. Each model comes with image understanding capabilities, all under the Apache 2.0 license.

View on arXiv PDF

Similar