RepoShapley: Shapley-Enhanced Context Filtering for Repository-Level Code Completion
For developers using repository-level code completion, this provides a principled method to filter cross-file context, addressing interaction-dependent chunk utility.
RepoShapley improves repository-level code completion by using Shapley-based context filtering to select optimal chunks, reducing harmful context and unnecessary retrieval while improving completion quality across benchmarks and backbones.
Repository-level code completion benefits from retrieval-augmented generation (RAG). However, controlling cross-file evidence is difficult because chunk utility is often interaction-dependent: some snippets help only when paired with complementary context, while others harm decoding when they conflict. We propose RepoShapley, a coalition-aware context filtering framework supervised by Shapley-style marginal contributions. Our offline labeling module, ChunkShapley, estimates signed per-chunk effects via teacher-forced probing, feeds them into a lightweight surrogate game that captures saturation and interference, computes exact Shapley values for small retrieval sets, and selects a decoding-optimal coalition through bounded post-verification with the frozen generator. The verified keep/drop decisions and retrieval triggers are then distilled into a single model via discrete control tokens. Experiments across benchmarks and backbones show that RepoShapley improves completion quality while reducing harmful context and unnecessary retrieval.