SEAIHCNov 24, 2025

Optimizing LLM Code Suggestions: Feedback-Driven Timing with Lightweight State Bounds

arXiv:2511.18842v11 citations
Originality Incremental advance
AI Analysis

This work addresses a practical bottleneck for developers using LLM-based code assistants by optimizing suggestion timing to enhance efficiency and cost-effectiveness, representing an incremental improvement over existing methods.

The paper tackled the problem of when to present LLM code suggestions to avoid interruptions and wasted inference calls, proposing an adaptive timing mechanism that improved suggestion acceptance from 4.9% to 18.6% and reduced wasted inference calls by 75% in a deployment with professional developers.

Large Language Models (LLMs) have transformed code auto-completion by generating context-aware suggestions. Yet, deciding when to present these suggestions remains underexplored, often leading to interruptions or wasted inference calls. We propose an adaptive timing mechanism that dynamically adjusts the delay before offering a suggestion based on real-time developer feedback. Our suggested method combines a logistic transform of recent acceptance rates with a bounded delay range, anchored by a high-level binary prediction of the developer's cognitive state. In a two-month deployment with professional developers, our system improved suggestion acceptance from 4.9% with no delay to 15.4% with static delays, and to 18.6% with adaptive timing-while reducing blind rejections (rejections without being read) from 8.3% to 0.36%. Together, these improvements increase acceptance and substantially reduce wasted inference calls by 75%, making LLM-based code assistants more efficient and cost-effective in practice.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes