VineLM: Trie-Based Fine-Grained Control for Agentic Workflows

Nikos Pagonas, Matthew Lou, Tianyi Peng, Dan Rubenstein, Kostis Kaffes

arXiv:2605.2391479.4

AI Analysis

For developers of agentic workflows, VineLM provides fine-grained runtime control to optimize accuracy under cost/latency constraints, addressing the limitation of static workflow-level plans.

VineLM introduces a trie-based workflow manager that dynamically selects models for each stage invocation in agentic workflows, improving accuracy by up to 18% at the same cost-latency budget while reducing offline profiling costs by 98-99.8%.

Agentic workflows interleave configurable LLM stages with tool stages and often include retries or refinement loops. Existing workflow managers profile full workflow configurations offline and assign each request a static workflow-level plan that binds each configurable LLM stage to a single model, reuses that model across repeated loop iterations, and does not revisit those choices at runtime. We present VineLM, a workflow manager that enables fine-grained control by choosing the model for each stage invocation as execution unfolds under request-level objectives such as maximizing accuracy under cost or latency budgets. VineLM represents feasible executions as an annotated trie of model-choice prefixes and uses checkpointing and cascade profiling to estimate path accuracy, cost, and latency without exhaustively profiling every request on every path. At runtime, VineLM re-roots the trie after each stage invocation and replans over the remaining subtrie using the realized execution prefix and remaining latency budget. On NL2SQL and math reasoning workflows, VineLM improves the cost-latency-accuracy frontier over coarse workflow-level baselines, achieving up to 18% higher accuracy at the same per-request budget with its sparse profiling reducing offline profiling cost by 98-99.8% when compared to exhaustive profiling.

View on arXiv PDF

Similar