CLAIFeb 11

Step 3.5 Flash: Open Frontier-Level Intelligence with 11B Active Parameters

arXiv:2602.10604v115 citationsh-index: 32
Originality Incremental advance
AI Analysis

This work addresses the problem of deploying sophisticated AI agents in real-world industrial environments by balancing high intelligence with efficiency, though it appears incremental in combining existing techniques like MoE and reinforcement learning.

The paper tackles the challenge of achieving frontier-level agentic intelligence with computational efficiency by introducing Step 3.5 Flash, a sparse Mixture-of-Experts model with 11B active parameters, which demonstrates strong performance on benchmarks like 85.4% on IMO-AnswerBench and 86.4% on LiveCodeBench-v6, comparable to models like GPT-5.2 xHigh.

We introduce Step 3.5 Flash, a sparse Mixture-of-Experts (MoE) model that bridges frontier-level agentic intelligence and computational efficiency. We focus on what matters most when building agents: sharp reasoning and fast, reliable execution. Step 3.5 Flash pairs a 196B-parameter foundation with 11B active parameters for efficient inference. It is optimized with interleaved 3:1 sliding-window/full attention and Multi-Token Prediction (MTP-3) to reduce the latency and cost of multi-round agentic interactions. To reach frontier-level intelligence, we design a scalable reinforcement learning framework that combines verifiable signals with preference feedback, while remaining stable under large-scale off-policy training, enabling consistent self-improvement across mathematics, code, and tool use. Step 3.5 Flash demonstrates strong performance across agent, coding, and math tasks, achieving 85.4% on IMO-AnswerBench, 86.4% on LiveCodeBench-v6 (2024.08-2025.05), 88.2% on tau2-Bench, 69.0% on BrowseComp (with context management), and 51.0% on Terminal-Bench 2.0, comparable to frontier models such as GPT-5.2 xHigh and Gemini 3.0 Pro. By redefining the efficiency frontier, Step 3.5 Flash provides a high-density foundation for deploying sophisticated agents in real-world industrial environments.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes