SELGAug 25, 2022

A Comparison of Reinforcement Learning Frameworks for Software Testing Tasks

arXiv:2208.12136v316 citationsh-index: 48
Originality Synthesis-oriented
AI Analysis

This work addresses a gap for software testing practitioners by providing empirical insights to help choose DRL frameworks, though it is incremental as it builds on existing methods without introducing new paradigms.

The paper tackles the lack of empirical evaluation of Deep Reinforcement Learning (DRL) frameworks for software testing by comparing their effectiveness on test case prioritization and game testing tasks, finding that some frameworks like Tensorforce outperform recent approaches and performance differences between algorithms can be considerable.

Software testing activities scrutinize the artifacts and the behavior of a software product to find possible defects and ensure that the product meets its expected requirements. Recently, Deep Reinforcement Learning (DRL) has been successfully employed in complex testing tasks such as game testing, regression testing, and test case prioritization to automate the process and provide continuous adaptation. Practitioners can employ DRL by implementing from scratch a DRL algorithm or using a DRL framework. DRL frameworks offer well-maintained implemented state-of-the-art DRL algorithms to facilitate and speed up the development of DRL applications. Developers have widely used these frameworks to solve problems in various domains including software testing. However, to the best of our knowledge, there is no study that empirically evaluates the effectiveness and performance of implemented algorithms in DRL frameworks. Moreover, some guidelines are lacking from the literature that would help practitioners choose one DRL framework over another. In this paper, we empirically investigate the applications of carefully selected DRL algorithms on two important software testing tasks: test case prioritization in the context of Continuous Integration (CI) and game testing. For the game testing task, we conduct experiments on a simple game and use DRL algorithms to explore the game to detect bugs. Results show that some of the selected DRL frameworks such as Tensorforce outperform recent approaches in the literature. To prioritize test cases, we run experiments on a CI environment where DRL algorithms from different frameworks are used to rank the test cases. Our results show that the performance difference between implemented algorithms in some cases is considerable, motivating further investigation.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes