CLApr 11

CodeComp: Structural KV Cache Compression for Agentic Coding

arXiv:2604.1023577.4h-index: 7
Predicted impact top 76% in CL · last 90 daysOriginality Incremental advance
AI Analysis

For developers using LLM-based coding agents, CodeComp enables efficient processing of long codebases without sacrificing accuracy, addressing a practical memory bottleneck.

CodeComp introduces a training-free KV cache compression method that uses static program analysis (Code Property Graphs) to preserve structurally critical tokens, outperforming attention-only baselines on bug localization and code generation benchmarks under tight memory constraints.

Agentic code tasks such as fault localization and patch generation require processing long codebases under tight memory constraints, where the Key-Value (KV) cache becomes the primary inference bottleneck. Existing compression methods rely exclusively on attention signals to estimate token importance, systematically discarding structurally critical tokens such as call sites, branch conditions, and assignments that are essential for code understanding. We present CodeComp, a training-free KV cache compression framework that incorporates static program analysis into LLM inference via Code Property Graph priors extracted by Joern. Across bug localization and code generation benchmarks, CodeComp consistently outperforms attention-only compression baselines under equal memory budgets, recovering the majority of full-context accuracy under aggressive KV cache compression, while matching the patch generation quality of uncompressed full-context inference and integrating seamlessly into SGLang-based agentic coding pipelines without model modification.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes