SE LOMay 29

Specification-Driven Development Benchmark: Security Knowledge Transition

Oleg Grynets, Andrii Salyk, Vasyl Lyashkevych, Oleh Kaskun, Danyil Zhuravchak

arXiv:2606.0016737.9h-index: 1

AI Analysis

For developers using LLM-based agents in specification-driven development, this work provides a method to operationalize security knowledge, reducing security failures in generated code.

The paper addresses the security gap in specification-driven AI code generation by proposing a Multilayer Specification Security Model and a Security Knowledge Transition Method. In a backend generation study, modal failures decreased from 50 (baseline) to 42 (ASVS) and 36 (Multilayer Security Model) against a 221-test suite.

AI-assisted software development is shifting from isolated code completion toward specification-driven generation, where business requirements, technical specifications, and acceptance criteria become operational input for LLM-based development agents. This shift creates a security problem: functional behavior is described explicitly, while security behavior remains implicit, generic, or postponed to post-generation review, causing generated systems to satisfy visible functional requirements while failing to preserve authorization rules, ownership boundaries, input validation, token rejection, sensitive data handling, and abuse-case semantics. This paper proposes a security knowledge operationalization approach for AI-assisted specification-driven development, combining two contributions: a Multilayer Specification Security Model that represents security knowledge through traceable relations between system entities, threats, risks, requirements, implementation rules, controls, verification scenarios, and evidence; and a Security Knowledge Transition Method that transforms business and technical specifications into a validated security-enriched generation contract. We evaluate the approach through two empirical studies: a hidden-oracle study assessing whether an LLM-based pipeline can derive a structured security model from system context, and a backend generation study under three conditions: no explicit security requirements, ASVS-conditioned generation, and Multilayer Security Model conditioning. Evaluated against a hidden 221-test black-box API suite, modal failures decreased from 50 in the baseline to 42 with ASVS and 36 with the Multilayer Security Model, with the strongest improvements in application-specific categories such as business logic and admin safety.

View on arXiv PDF

Similar