Hawkeye: Change-targeted Testing for Android Apps based on Deep Reinforcement Learning
This addresses the need for faster and more effective testing of app updates for developers, though it is incremental as it builds on existing change-focused testing methods.
The paper tackles the problem of inefficient testing for Android app updates by proposing Hawkeye, a tool that uses deep reinforcement learning to prioritize GUI actions associated with code changes, achieving more reliable targeting of changed functions compared to state-of-the-art tools like FastBot2 and ARES in evaluations with 10 open-source and 1 commercial app.
Android Apps are frequently updated to keep up with changing user, hardware, and business demands. Ensuring the correctness of App updates through extensive testing is crucial to avoid potential bugs reaching the end user. Existing Android testing tools generate GUI events focussing on improving the test coverage of the entire App rather than prioritising updates and its impacted elements. Recent research has proposed change-focused testing but relies on random exploration to exercise the updates and impacted GUI elements that is ineffective and slow for large complex Apps with a huge input exploration space. We propose directed testing of App updates with Hawkeye that is able to prioritise executing GUI actions associated with code changes based on deep reinforcement learning from historical exploration data. Our empirical evaluation compares Hawkeye with state-of-the-art model-based and reinforcement learning-based testing tools FastBot2 and ARES using 10 popular open-source and 1 commercial App. We find that Hawkeye is able to generate GUI event sequences targeting changed functions more reliably than FastBot2 and ARES for the open source Apps and the large commercial App. Hawkeye achieves comparable performance on smaller open source Apps with a more tractable exploration space. The industrial deployment of Hawkeye in the development pipeline also shows that Hawkeye is ideal to perform smoke testing for merge requests of a complicated commercial App.