Cross-Lingual Token Arbitrage: Optimizing Code Agent Context Windows via Local LLM Preprocessing
For developers using AI coding agents, this work reduces inference costs without sacrificing coding quality, offering a proactive alternative to reactive compression methods.
The paper addresses input-token cost bottlenecks in AI-assisted coding agents by introducing a pre-flight, edge-side prompt-rewriting middleware that performs cross-lingual translation and structural rewriting. The method reduces prompt tokens by 34-47% and total tokens by up to 18.8% while preserving or improving task accuracy across three commercial LLM backends.
AI-assisted coding agents are bottlenecked by input-token cost. Two pathologies of raw human input drive much of this overhead: tokenization inefficiency for non-English text and structural entropy in conversational prompts. Existing approaches act reactively by compressing already-bloated contexts or intervening after failures occur. We introduce a pre-flight, edge-side prompt-rewriting middleware that operates between the developer and the cloud agent. A local Llama 3.2 (3B) model performs cross-lingual translation into English, structural rewriting into a compact task-oriented format, and regex-validated rewrite-with-fallback safeguards to ensure the optimized prompt is never larger than the original. We evaluate on OMH-Polyglot, a multilingual coding benchmark spanning Turkish, Arabic, Chinese, and code-switched specifications. Across three commercial LLM backends, the middleware reduces prompt tokens by 34-47 percent and total tokens by up to 18.8 percent while preserving or improving task accuracy. Ablation studies show that gains arise primarily from the rewriting stage rather than simple function-name extraction. Compared with LLMLingua-2 at matched compression rates, our method consistently achieves superior OckScore performance across all evaluated backends. These results demonstrate that proactive prompt optimization can substantially reduce inference costs without sacrificing coding quality.