CL AI LGFeb 2

Kimi K2.5: Visual Agentic Intelligence

Kimi Team, Tongtong Bai, Yifan Bai, Yiping Bao, S. H. Cai, Yuan Cao, Y. Charles, H. S. Che, Cheng Chen, Guanduo Chen, Huarong Chen, Jia Chen

arXiv:2602.02276v130.0247 citationsh-index: 14Has Code

Originality Incremental advance

AI Analysis

This addresses the need for efficient and capable AI agents in research and applications, though it appears incremental as it builds on multimodal foundations.

The paper tackles the problem of advancing general agentic intelligence by introducing Kimi K2.5, an open-source multimodal model that achieves state-of-the-art results across domains like coding and vision, with Agent Swarm reducing latency by up to 4.5 times over baselines.

We introduce Kimi K2.5, an open-source multimodal agentic model designed to advance general agentic intelligence. K2.5 emphasizes the joint optimization of text and vision so that two modalities enhance each other. This includes a series of techniques such as joint text-vision pre-training, zero-vision SFT, and joint text-vision reinforcement learning. Building on this multimodal foundation, K2.5 introduces Agent Swarm, a self-directed parallel agent orchestration framework that dynamically decomposes complex tasks into heterogeneous sub-problems and executes them concurrently. Extensive evaluations show that Kimi K2.5 achieves state-of-the-art results across various domains including coding, vision, reasoning, and agentic tasks. Agent Swarm also reduces latency by up to $4.5\times$ over single-agent baselines. We release the post-trained Kimi K2.5 model checkpoint to facilitate future research and real-world applications of agentic intelligence.

View on arXiv PDF

Similar