Refactoring Codebases through Library Design
This addresses the challenge of making code more maintainable and reusable for developers, especially as code agents handle isolated programming tasks, though it appears incremental by building on existing refactoring and library generation techniques.
The paper tackles the problem of refactoring specialized code into reusable libraries to improve software maintainability, introducing a benchmark (MiniCode) and a method (Librarian) that outperforms state-of-the-art library generation methods on real-world codebases.
Maintainable and general software allows developers to build robust applications efficiently, yet achieving these qualities often requires refactoring specialized solutions into reusable components. This challenge becomes particularly relevant as code agents become used to solve isolated one-off programming problems. We investigate code agents' capacity to refactor code in ways that support growth and reusability. We first investigate what makes a good refactoring, finding via simulation results and a human study that Minimum Description Length best correlates with preferable refactorings. We then present both a benchmark and a method for refactoring: MiniCode, a benchmark where multiple files must be refactored into a shared library, and Librarian, a sample-and-rerank method for generating reusable libraries. We compare Librarian to state-of-the-art library generation methods, and study it on real-world code bases.