Why Agent Caching Fails and How to Fix It: Structured Intent Canonicalization with Few-Shot Learning

arXiv:2602.18922v1

Originality Highly original

AI Analysis

This addresses cost inefficiency in AI agents, offering a novel caching solution with broad practical impact.

The paper tackles the high cost of repeated LLM calls in personal AI agents by identifying that existing caching methods fail due to optimizing for the wrong property, and introduces a structured intent decomposition framework (W5H2) with few-shot learning, achieving 91.1% accuracy on MASSIVE and projecting a 97.5% cost reduction.

Personal AI agents incur substantial cost via repeated LLM calls. We show existing caching methods fail: GPTCache achieves 37.9% accuracy on real benchmarks; APC achieves 0-12%. The root cause is optimizing for the wrong property -- cache effectiveness requires key consistency and precision, not classification accuracy. We observe cache-key evaluation reduces to clustering evaluation and apply V-measure decomposition to separate these on n=8,682 points across MASSIVE, BANKING77, CLINC150, and NyayaBench v2, our new 8,514-entry multilingual agentic dataset (528 intents, 20 W5H2 classes, 63 languages). We introduce W5H2, a structured intent decomposition framework. Using SetFit with 8 examples per class, W5H2 achieves 91.1%+/-1.7% on MASSIVE in ~2ms -- vs 37.9% for GPTCache and 68.8% for a 20B-parameter LLM at 3,447ms. On NyayaBench v2 (20 classes), SetFit achieves 55.3%, with cross-lingual transfer across 30 languages. Our five-tier cascade handles 85% of interactions locally, projecting 97.5% cost reduction. We provide risk-controlled selective prediction guarantees via RCPS with nine bound families.

View on arXiv PDF

Similar