An Empirical Study of Automated Unit Test Generation for Python
This work addresses the challenge of automated testing for dynamically typed languages like Python, which is incremental as it builds on prior tools and algorithms.
The study tackled automated unit test generation for Python by extending the Pynguin framework to support more language features and evaluating evolutionary algorithms like DynaMOSA, which achieved the highest coverage results, confirming they outperform random generation but highlighting issues like type inference limitations.
Various mature automated test generation tools exist for statically typed programming languages such as Java. Automatically generating unit tests for dynamically typed programming languages such as Python, however, is substantially more difficult due to the dynamic nature of these languages as well as the lack of type information. Our Pynguin framework provides automated unit test generation for Python. In this paper, we extend our previous work on Pynguin to support more aspects of the Python language, and by studying a larger variety of well-established state of the art test-generation algorithms, namely DynaMOSA, MIO, and MOSA. Furthermore, we improved our Pynguin tool to generate regression assertions, whose quality we also evaluate. Our experiments confirm that evolutionary algorithms can outperform random test generation also in the context of Python, and similar to the Java world, DynaMOSA yields the highest coverage results. However, our results also demonstrate that there are still fundamental remaining issues, such as inferring type information for code without this information, currently limiting the effectiveness of test generation for Python.