CLSEMar 10, 2025

RepoST: Scalable Repository-Level Coding Environment Construction with Sandbox Testing

arXiv:2503.07358v19 citationsh-index: 6
Originality Incremental advance
AI Analysis

This addresses the problem of scalable execution feedback for code generation models, offering a practical solution for researchers and developers, though it is incremental by building on existing sandbox testing concepts.

The authors tackled the challenge of constructing scalable repository-level coding environments for training and evaluation by introducing RepoST, which uses sandbox testing to isolate functions and dependencies, resulting in a 5.5% Pass@1 gain on HumanEval and 3.5% on RepoEval.

We present RepoST, a scalable method to construct environments that provide execution feedback for repository-level code generation for both training and evaluation. Unlike existing works that aim to build entire repositories for execution, which is challenging for both human and LLMs, we provide execution feedback with sandbox testing, which isolates a given target function and its dependencies to a separate script for testing. Sandbox testing reduces the complexity of external dependencies and enables constructing environments at a large scale. We use our method to construct RepoST-Train, a large-scale train set with 7,415 functions from 832 repositories. Training with the execution feedback provided by RepoST-Train leads to a performance gain of 5.5% Pass@1 on HumanEval and 3.5% Pass@1 on RepoEval. We also build an evaluation dataset, RepoST-Eval, and benchmark 12 code generation models.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes