SELGJan 15

The Llama 4 Herd: Architecture, Training, Evaluation, and Deployment Notes

AI2AmazonMeta AI
arXiv:2601.11659v112 citationsh-index: 50
Originality Synthesis-oriented
AI Analysis

It offers a technical reference for researchers and practitioners needing precise facts about Llama 4, but is incremental as it compiles existing information without new contributions.

This document consolidates publicly reported technical details about Meta's Llama 4 model family, summarizing its architecture, training, evaluation, and deployment to provide a compact reference for researchers and practitioners.

This document consolidates publicly reported technical details about Metas Llama 4 model family. It summarizes (i) released variants (Scout and Maverick) and the broader herd context including the previewed Behemoth teacher model, (ii) architectural characteristics beyond a high-level MoE description covering routed/shared-expert structure, early-fusion multimodality, and long-context design elements reported for Scout (iRoPE and length generalization strategies), (iii) training disclosures spanning pre-training, mid-training for long-context extension, and post-training methodology (lightweight SFT, online RL, and lightweight DPO) as described in release materials, (iv) developer-reported benchmark results for both base and instruction-tuned checkpoints, and (v) practical deployment constraints observed across major serving environments, including provider-specific context limits and quantization packaging. The manuscript also summarizes licensing obligations relevant to redistribution and derivative naming, and reviews publicly described safeguards and evaluation practices. The goal is to provide a compact technical reference for researchers and practitioners who need precise, source-backed facts about Llama 4.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes