SEMay 3

Scenario-Guided LLM-based Mobile App GUI Testing

arXiv:2506.0507993.37 citationsh-index: 18
AI Analysis

For mobile app testers, this approach bridges the gap between automated GUI testing and business logic, enabling scenario-driven testing that was previously manual.

ScenGen uses a multi-agent LLM framework to automate mobile app GUI testing guided by specific business scenarios, achieving 93.5% scenario completion rate and detecting 37 new bugs across 10 apps, outperforming existing tools by 20-30% in coverage.

The assurance of mobile app GUI has become increasingly important, as the GUI serves as the primary medium of interaction between users and apps. Although numerous automated GUI testing approaches have been developed with diverse strategies, a substantial gap remains between these approaches and the underlying app business logic. Most existing approaches focus on general exploration rather than the completion of specific testing scenarios, often resulting in missed coverage of critical functionalities. Inspired by the manual testing process, which treats business logic, driven testing scenarios as the fundamental unit of testing, this paper introduces an approach that leverages large language models (LLMs) to comprehend the semantics expressed in app GUIs and their contextual relevance to given testing scenarios. Building upon this capability, we propose ScenGen, a novel scenario-guided LLM-based GUI testing framework that employs a multi-agent collaboration mechanism to simulate and automate the phases of manual testing. ScenGen integrates five agents. The Observer perceives the app GUI state by extracting and structuring GUI widgets and layouts, thereby interpreting the semantic information presented in the GUI. This information is then passed to the Decider, which makes scenario-driven decisions with the guidance of LLMs to identify target widgets and determine appropriate actions toward fulfilling specific testing goals. The Executor executes the decided operations on the app, while the Supervisor verifies whether the execution results align with the intended testing scenario completion, ensuring traceability and consistency in test generation and execution. Finally, the Recorder records the corresponding GUI operations into the context memory as a knowledge base for subsequent decision-making and concurrently monitors runtime bug occurrences.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes