CL AIFeb 11

Step 3.5 Flash: Open Frontier-Level Intelligence with 11B Active Parameters

Ailin Huang, Ang Li, Aobo Kong, Bin Wang, Binxing Jiao, Bo Dong, Bojun Wang, Boyu Chen, Brian Li, Buyun Ma, Chang Su, Changxin Miao

arXiv:2602.10604v17.021 citationsh-index: 48

Originality Incremental advance

AI Analysis

This work addresses the problem of deploying sophisticated AI agents in real-world industrial environments by balancing high intelligence with efficiency, though it appears incremental in combining existing techniques like MoE and reinforcement learning.

The paper tackles the challenge of achieving frontier-level agentic intelligence with computational efficiency by introducing Step 3.5 Flash, a sparse Mixture-of-Experts model with 11B active parameters, which demonstrates strong performance on benchmarks like 85.4% on IMO-AnswerBench and 86.4% on LiveCodeBench-v6, comparable to models like GPT-5.2 xHigh.

We introduce Step 3.5 Flash, a sparse Mixture-of-Experts (MoE) model that bridges frontier-level agentic intelligence and computational efficiency. We focus on what matters most when building agents: sharp reasoning and fast, reliable execution. Step 3.5 Flash pairs a 196B-parameter foundation with 11B active parameters for efficient inference. It is optimized with interleaved 3:1 sliding-window/full attention and Multi-Token Prediction (MTP-3) to reduce the latency and cost of multi-round agentic interactions. To reach frontier-level intelligence, we design a scalable reinforcement learning framework that combines verifiable signals with preference feedback, while remaining stable under large-scale off-policy training, enabling consistent self-improvement across mathematics, code, and tool use. Step 3.5 Flash demonstrates strong performance across agent, coding, and math tasks, achieving 85.4% on IMO-AnswerBench, 86.4% on LiveCodeBench-v6 (2024.08-2025.05), 88.2% on tau2-Bench, 69.0% on BrowseComp (with context management), and 51.0% on Terminal-Bench 2.0, comparable to frontier models such as GPT-5.2 xHigh and Gemini 3.0 Pro. By redefining the efficiency frontier, Step 3.5 Flash provides a high-density foundation for deploying sophisticated agents in real-world industrial environments.

View on arXiv PDF

Similar