CRAIMay 14

Autonomous Intelligent Agents for Natural-Language-Driven Web Execution with Integrated Security Assurance

arXiv:2605.1528128.7
Predicted impact top 61% in CR · last 90 daysOriginality Incremental advance
AI Analysis

For software engineers and QA teams, this framework automates brittle web testing and security validation, significantly reducing manual effort and improving reliability.

The paper presents an AI-driven autonomous testing framework that improves web test script generation success from 55% to 93%, reduces navigation failures by 8x, eliminates 80% of timing-related race conditions, and cuts test creation time by 75%. It also extends to natural-language-driven security testing, detecting 85% of authentication bypass and 95% of input validation flaws with false positive rates below 12%.

Modern web test suites rot. A UI refactor breaks locators, a timing change causes race conditions, and within weeks developers abandon the suite entirely. This paper presents an AI-driven autonomous testing framework that addresses these failure modes through five integrated strategies - navigation reliability, context-aware selector generation, post-generation validation, smart wait injection, and failure learning - implemented over a containerised worker architecture that decouples orchestration from long-running browser execution. Evaluated across four production applications and 176 scenarios, the framework improves script generation success from 55% to 93%, achieves an 8x reduction in navigation failures, eliminates 80% of timing-related race conditions, and reduces test creation time by 75% compared to manual Selenium authoring. The framework extends naturally to security validation: testers describe attack scenarios in plain English - "try accessing another user's invoice" - which the agent converts to OWASP Top 10-aligned browser probes, detecting 85% of authentication bypass vulnerabilities and 95% of input validation flaws with false positive rates below 12%. Natural-language-driven security testing of this kind represents, to our knowledge, a novel contribution to the field.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes