CLFeb 2Code
Kimi K2.5: Visual Agentic IntelligenceKimi Team, Tongtong Bai, Yifan Bai et al.
We introduce Kimi K2.5, an open-source multimodal agentic model designed to advance general agentic intelligence. K2.5 emphasizes the joint optimization of text and vision so that two modalities enhance each other. This includes a series of techniques such as joint text-vision pre-training, zero-vision SFT, and joint text-vision reinforcement learning. Building on this multimodal foundation, K2.5 introduces Agent Swarm, a self-directed parallel agent orchestration framework that dynamically decomposes complex tasks into heterogeneous sub-problems and executes them concurrently. Extensive evaluations show that Kimi K2.5 achieves state-of-the-art results across various domains including coding, vision, reasoning, and agentic tasks. Agent Swarm also reduces latency by up to $4.5\times$ over single-agent baselines. We release the post-trained Kimi K2.5 model checkpoint to facilitate future research and real-world applications of agentic intelligence.
LGOct 19, 2022
Knowledge-Enhanced Relation Extraction DatasetYucong Lin, Hongming Xiao, Jiani Liu et al.
Recently, knowledge-enhanced methods leveraging auxiliary knowledge graphs have emerged in relation extraction, surpassing traditional text-based approaches. However, to our best knowledge, there is currently no public dataset available that encompasses both evidence sentences and knowledge graphs for knowledge-enhanced relation extraction. To address this gap, we introduce the Knowledge-Enhanced Relation Extraction Dataset (KERED). KERED annotates each sentence with a relational fact, and it provides knowledge context for entities through entity linking. Using our curated dataset, We compared contemporary relation extraction methods under two prevalent task settings: sentence-level and bag-level. The experimental result shows the knowledge graphs provided by KERED can support knowledge-enhanced relation extraction methods. We believe that KERED offers high-quality relation extraction datasets with corresponding knowledge graphs for evaluating the performance of knowledge-enhanced relation extraction methods. Our dataset is available at: \url{https://figshare.com/projects/KERED/134459}
DCJan 28
SuperInfer: SLO-Aware Rotary Scheduling and Memory Management for LLM Inference on SuperchipsJiahuan Yu, Mingtao Hu, Zichao Lin et al.
Large Language Model (LLM) serving faces a fundamental tension between stringent latency Service Level Objectives (SLOs) and limited GPU memory capacity. When high request rates exhaust the KV cache budget, existing LLM inference systems often suffer severe head-of-line (HOL) blocking. While prior work explored PCIe-based offloading, these approaches cannot sustain responsiveness under high request rates, often failing to meet tight Time-To-First-Token (TTFT) and Time-Between-Tokens (TBT) SLOs. We present SuperInfer, a high-performance LLM inference system designed for emerging Superchips (e.g., NVIDIA GH200) with tightly coupled GPU-CPU architecture via NVLink-C2C. SuperInfer introduces RotaSched, the first proactive, SLO-aware rotary scheduler that rotates requests to maintain responsiveness on Superchips, and DuplexKV, an optimized rotation engine that enables full-duplex transfer over NVLink-C2C. Evaluations on GH200 using various models and datasets show that SuperInfer improves TTFT SLO attainment rates by up to 74.7% while maintaining comparable TBT and throughput compared to state-of-the-art systems, demonstrating that SLO-aware scheduling and memory co-design unlocks the full potential of Superchips for responsive LLM serving.