65.4SEApr 15
On the Effectiveness of Context Compression for Repository-Level Tasks: An Empirical InvestigationJia Feng, Zhanyue Qin, Cuiyun Gao et al.
Repository-level code intelligence tasks require large language models (LLMs) to process long, multi-file contexts. Such inputs introduce three challenges: crucial context can be obscured by noise, truncated due to limited windows, and increased inference latency. Context compression mitigates these risks by condensing inputs. While studied in NLP, its applicability to code tasks remains largely unexplored. We present the first systematic empirical study of context compression for repository-level code intelligence, organizing eight methods into three paradigms: discrete token sequences, continuous latent vectors, and visual tokens. We evaluate them on code completion and generation, measuring performance and efficiency. Results show context compression is effective: at 4x compression, continuous latent vector methods surpass full-context performance by up to 28.3% in BLEU score, indicating they filter noise rather than just truncating. On efficiency, all paradigms reduce inference cost. Both visual and text-based compression achieve up to 50% reduction in end-to-end latency at high ratios, approaching the cost of inference without repository context. These findings establish context compression as a viable approach and provide guidance for paradigm selection.
79.6LGMay 18
Enhancing the Code Reasoning Capabilities of LLMs via Consistency-based Reinforcement LearningZhanyue Qin, Jia Feng, Yibo Lyu et al.
Code reasoning refers to the task of predicting the output of a program given its source code and specific inputs. It can measure the reasoning capability of large language models (LLMs) and also benefit downstream tasks such as code generation and mathematical reasoning. Existing work has verified the effectiveness of reinforcement learning on the task. However, these methods design rewards solely based on final outputs or coarse-grained signals, and neglect the inherent consistency of the stepwise reasoning process in the task. Therefore, these methods often result in sparse reward or reward hacking, which limits the full play of enhanced learning capabilities. To alleviate these issues, we propose CodeThinker, a consistency-driven reinforcement learning framework for code reasoning. Specifically, CodeThinker has three key components: (1) a stepwise reasoning-aware model training module, which utilizes a consistency tracing paradigm as a template to synthesize training data that captures the stepwise reasoning process; (2) a dynamic beam sampling strategy, which aims to improve the quality of sampled outputs under a fixed sampling budget; and (3) a consistency reward mechanism that can effectively alleviate reward hacking. Experiments on three popular benchmarks show that CodeThinker achieves state-of-the-art performance across multiple LLMs. For instance, it outperforms the strongest baseline by 4.3% in accuracy when deployed on Qwen2.5-Coder-7B-Instruct. We also validate the effectiveness of CodeThinker on downstream tasks. Results show that, without additional training, CodeThinker obtains average accuracy gains of 5.33 and 3.11 percentage points on mathematical reasoning and code reasoning tasks covering 17 programming languages, respectively.
74.7SEApr 3
Dependency-Guided Repository-Level C-to-Rust Translation with Reinforcement AlignmentJia Feng, Wenjie Gan, Cuiyun Gao et al.
Automating C-to-Rust migration is critical for improving software security without sacrificing performance. Traditional rule-based methods struggle with diverse C idioms, often producing rigid and unidiomatic Rust code. Large Language Models (LLMs), trained on massive code corpora, offer a promising alternative by leveraging cross-language generalization to generate more idiomatic and maintainable Rust code. However, several challenges remain. First, existing LLM-based approaches fail to handle cross-file dependencies effectively, either ignoring them or including entire files as context, which limits accurate dependency modeling. Second, complex dependencies and structured inputs and outputs make it difficult to verify syntactic correctness and functional equivalence at the repository level. Third, the lack of large-scale C-Rust parallel data constrains model performance. We propose DepTrans, a framework that combines model capability enhancement with structured inference. DepTrans introduces Reinforcement-Aligned Syntax Training to improve generation quality through multi-task fine-tuning and feedback-driven reinforcement learning. It further applies Dependency-Guided Iterative Refinement to capture fine-grained cross-file dependencies and iteratively refine generated Rust code. We construct a dataset of 85k training samples and a benchmark of 145 repository-level instances. Experiments show that DepTrans achieves a 60.7 percent compilation success rate and 43.5 percent computational accuracy, outperforming the strongest baseline by 22.8 and 17.3 percentage points. It also successfully builds 7 of 15 industrial C projects, demonstrating its practical potential.