Ministral 3
This provides more accessible language models for resource-limited users, though it appears incremental as it builds on existing parameter-efficient and distillation techniques.
The authors tackled the problem of developing parameter-efficient dense language models for compute and memory constrained applications by introducing the Ministral 3 series in three sizes (3B, 8B, 14B parameters) with variants for general use, instruction finetuning, and reasoning, created using Cascade Distillation.
We introduce the Ministral 3 series, a family of parameter-efficient dense language models designed for compute and memory constrained applications, available in three model sizes: 3B, 8B, and 14B parameters. For each model size, we release three variants: a pretrained base model for general-purpose use, an instruction finetuned, and a reasoning model for complex problem-solving. In addition, we present our recipe to derive the Ministral 3 models through Cascade Distillation, an iterative pruning and continued training with distillation technique. Each model comes with image understanding capabilities, all under the Apache 2.0 license.