SEMar 31

Enhancing LLM-Based Bug Reproduction for Android Apps via Pre-Assessment of Visual Effects

Xiangyang Xiao, Huaxun Huang, Rongxin Wu

arXiv:2603.2962356.5Has Code

AI Analysis

This addresses the time-consuming and complex process of reproducing user-reported bugs in Android app development, offering a significant improvement over existing methods.

The paper tackles the problem of automating bug reproduction for Android apps by proposing LTGDroid, which uses pre-assessment of visual effects to guide an LLM, achieving an 87.51% success rate and improving over baselines by up to 556.30%.

In the development and maintenance of Android apps, the quick and accurate reproduction of user-reported bugs is crucial to ensure application quality and improve user satisfaction. However, this process is often time-consuming and complex. Therefore, there is a need for an automated approach that can explore the Application Under Test (AUT) and identify the correct sequence of User Interface (UI) actions required to reproduce a bug, given only a complete bug report. Large Language Models (LLMs) have shown remarkable capabilities in understanding textual and visual semantics, making them a promising tool for planning UI actions. Nevertheless, our study shows that even when using state-of-the-art LLM-based approaches, these methods still struggle to follow detailed bug reproduction instructions and replan based on new information, due to their inability to accurately predict and interpret the visual effects of UI components. To address these limitations, we propose LTGDroid. Our insight is to execute all possible UI actions on the current UI page during exploration, record their corresponding visual effects, and leverage these visual cues to guide the LLM in selecting UI actions that are likely to reproduce the bug. We evaluated LTGDroid, instantiated with GPT-4.1, on a benchmark consisting of 75 bug reports from 45 popular Android apps. The results show that LTGDroid achieves a reproduction success rate of 87.51%, improving over the state-of-the-art baselines by 49.16% and 556.30%, while requiring an average of 20.45 minutes and approximately $0.27 to successfully reproduce a bug. The LTGDroid implementation is publicly available at https://github.com/N3onFlux/LTGDroid.

View on arXiv PDF Code

Similar