SEApr 17

RepoShapley: Shapley-Enhanced Context Filtering for Repository-Level Code Completion

arXiv:2601.0337883.61 citationsh-index: 4
AI Analysis

For developers using repository-level code completion, this provides a principled method to filter cross-file context, addressing interaction-dependent chunk utility.

RepoShapley improves repository-level code completion by using Shapley-based context filtering to select optimal chunks, reducing harmful context and unnecessary retrieval while improving completion quality across benchmarks and backbones.

Repository-level code completion benefits from retrieval-augmented generation (RAG). However, controlling cross-file evidence is difficult because chunk utility is often interaction-dependent: some snippets help only when paired with complementary context, while others harm decoding when they conflict. We propose RepoShapley, a coalition-aware context filtering framework supervised by Shapley-style marginal contributions. Our offline labeling module, ChunkShapley, estimates signed per-chunk effects via teacher-forced probing, feeds them into a lightweight surrogate game that captures saturation and interference, computes exact Shapley values for small retrieval sets, and selects a decoding-optimal coalition through bounded post-verification with the frozen generator. The verified keep/drop decisions and retrieval triggers are then distilled into a single model via discrete control tokens. Experiments across benchmarks and backbones show that RepoShapley improves completion quality while reducing harmful context and unnecessary retrieval.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes