OSDCLGMAApr 14

MARS: Efficient, Adaptive Co-Scheduling for Heterogeneous Agentic Systems

arXiv:2604.2696390.7
Predicted impact top 4% in OS · last 90 daysOriginality Incremental advance
AI Analysis

For systems deploying LLM-based autonomous agents, MARS addresses the critical challenge of coordinating heterogeneous GPU-CPU resources in multi-turn agentic loops.

MARS is a co-scheduling system for heterogeneous agentic workloads that reduces end-to-end latency by up to 5.94x while maintaining near-maximal throughput, and accelerates task completion in the OpenHands coding agent by up to 1.87x.

Large language models (LLMs) are increasingly deployed as the execution core of autonomous agents rather than as standalone text generators. Agentic workloads induce a temporal shift from single-turn inference to multi-turn LLM-tool loops, and a spatial shift from chat-scale, GPU-only execution to repository-scale, GPU-CPU co-located execution. Consequently, coordinating heterogeneous resource demands of agentic execution has emerged as a critical system challenge. We design and implement MARS, an efficient and adaptive co-scheduling system that globally coordinates heterogeneous agentic workloads under coupled GPU-CPU resource pressure. By establishing holistic visibility across GPU inference and CPU tool execution via a unified information stream, an external control plane in MARS decouples admission from execution to prevent heterogeneous resource oversubscription. An internal agent-centric scheduler further minimizes the end-to-end critical path by prioritizing latency-sensitive continuations and adaptively retaining KV cache state only when warm resumption yields a latency benefit. Our evaluations show that MARS reduces end-to-end latency by up to 5.94x while maintaining nearly maximal system throughput. We further integrate MARS as the serving backend for the OpenHands coding agent framework, demonstrating its real-world effectiveness by accelerating end-to-end task completion time by up to 1.87x. Our source code will be publicly available soon.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes