Test Case Generation for Program Repair: A Study of Feasibility and Effectiveness
This addresses the overfitting problem in automated program repair for software developers, but the results are incremental as the proposed approaches were not effective in improving patch correctness.
The study investigated whether generating additional test cases could reduce overfitting patches in test suite-based program repair, evaluating two approaches on 224 bugs from the Defects4J repository and finding that test case generation changed patches but did not effectively convert incorrect patches into correct ones.
Among the many different kinds of program repair techniques, one widely studied family of techniques is called test suite based repair. Test-suites are in essence input-output specifications and are therefore typically inadequate for completely specifying the expected behavior of the program under repair. Consequently, the patches generated by test suite based program repair techniques pass the test suite, yet may be incorrect. Patches that are overly specific to the used test suite and fail to generalize to other test cases are called overfitting patches. In this paper, we investigate the feasibility and effectiveness of test case generation in alleviating the overfitting issue. We propose two approaches for using test case generation to improve test suite based repair, and perform an extensive evaluation of the effectiveness of the proposed approaches in enabling better test suite based repair on 224 bugs of the Defects4J repository. The results indicate that test case generation can change the resulting patch, but is not effective at turning incorrect patches into correct ones. We identify the problems related with the ineffectiveness, and anticipate that our results and findings will lead to future research to build test-case generation techniques that are tailored to automatic repair systems.