DCMay 25

Agentic AI Workload Characteristics

arXiv:2605.2629766.2
Predicted impact top 13% in DC · last 90 daysOriginality Incremental advance
AI Analysis

For system designers and researchers building LLM serving infrastructure, this work provides the first detailed characterization of agentic workloads, revealing bottlenecks that differ from traditional prompt-generation serving.

This paper characterizes ReAct-style agentic AI workloads, finding that they are decode-dominated with high KV-cache reuse and exhibit a temporal shift from read/explore to execute/write tool behavior, highlighting the need for joint management of model re-entry, persistent context, and tool execution.

Agentic AI shifts LLM serving from isolated prompt-generation requests to stateful, multi-turn executions that repeatedly invoke the model, call tools, and grow context over time. This paper characterizes ReAct-style agents from both the LLM-serving and tool-execution perspectives using an end-to-end tracing infrastructure across reasoning and non-reasoning Gemma and Qwen configurations on five agentic benchmarks. Our study shows that agentic workloads are not simply long-prompt workloads: with effective context caching, most input tokens are reused across turns, making execution decode-dominated while increasing dependence on long-lived KV-cache state. We also find that tool use has a clear temporal structure, with agents shifting from read/explore behavior early in execution to execute/write behavior later. These results show that efficient agentic serving must jointly manage repeated model re-entry, persistent context state, and workload-dependent tool behavior.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes