Latency and Cost of Multi-Agent Intelligent Tutoring at Scale
Provides concrete deployment guidance for institutions choosing between throughput tiers for multi-agent tutoring systems at scales from a single seminar to university-wide rollout.
The paper evaluates latency and cost of a multi-agent LLM tutoring system (ITAS) across three throughput tiers and concurrency levels up to 50 users, finding Priority PayGo maintains sub-4-second response times under load, while Provisioned Throughput offers lowest latency at low concurrency but saturates above ~20 users. Cost analysis shows pay-per-token tiers are cheaper than a textbook per student per semester.
Multi-agent LLM tutoring systems improve response quality through agent specialization, but each student query triggers several concurrent API calls whose latencies compound through a parallel-phase maximum effect that single-agent systems do not face. We instrument ITAS, a four-agent tutoring system built on Gemini 2.5 Flash and Google Vertex AI, across three throughput tiers (Standard PayGo, Priority PayGo, and Provisioned Throughput) and eleven concurrency levels up to 50 simultaneous users, producing over 3,000 requests drawn from a live graduate STEM deployment. Priority PayGo maintains flat sub-4-second response times across the full load range; Standard PayGo degrades substantially under classroom-scale concurrency; and Provisioned Throughput delivers the lowest latency at low concurrency but saturates its reserved capacity above approximately 20 concurrent users. Cost analysis places both pay-per-token tiers well below the price of a STEM textbook per student per semester under a worst-case usage ceiling. Provisioned Throughput, expensive under continuous provisioning, becomes cost-competitive for institutions that can predict and concentrate their traffic toward high utilization. These results provide concrete tier-selection guidance across deployment scales from a single seminar to a university-wide rollout.