SE AINov 28, 2025

Autonomous QA Agent: A Retrieval-Augmented Framework for Reliable Selenium Script Generation

arXiv:2601.06034v1

Originality Incremental advance

AI Analysis

This addresses the manual and error-prone process of creating test scripts for software testing, though it is incremental as it applies RAG to a specific domain.

The paper tackles the problem of generating reliable Selenium test scripts from requirements by addressing LLM hallucinations of UI elements, achieving 100% syntax validity and 90% execution success in e-commerce scenarios compared to 30% for standard LLM generation.

Software testing is critical in the software development lifecycle, yet translating requirements into executable test scripts remains manual and error-prone. While Large Language Models (LLMs) can generate code, they often hallucinate non-existent UI elements. We present the Autonomous QA Agent, a Retrieval-Augmented Generation (RAG) system that grounds Selenium script generation in project-specific documentation and HTML structure. By ingesting diverse formats (Markdown, PDF, HTML) into a vector database, our system retrieves relevant context before generation. Evaluation on 20 e-commerce test scenarios shows our RAG approach achieves 100% (20/20) syntax validity and 90% (18/20, 95% CI: [85%, 95%], p < 0.001) execution success, compared to 30% for standard LLM generation. While our evaluation is limited to a single domain, our method significantly reduces hallucinations by grounding generation in actual DOM structure, demonstrating RAG's potential for automated UI testing.

View on arXiv PDF

Similar