SEAIMay 1

The Productivity-Reliability Paradox: Specification-Driven Governance for AI-Augmented Software Development

arXiv:2605.0116058.7
AI Analysis

For software engineering practitioners and researchers, this paper formalizes a paradoxical phenomenon and offers a governance framework to address it, though the solution is largely conceptual with limited empirical validation.

The paper identifies the Productivity-Reliability Paradox (PRP) in AI-augmented software development, where productivity gains on simple tasks contrast with slowdowns and quality issues in complex settings. It proposes a Specification Governance Model (SGM) and evaluates two instantiations via a four-month pilot, concluding that specification discipline is the key constraint on dependability.

Since 2022, AI-powered coding assistants have produced contradictory evidence: controlled studies report 20-56% productivity gains on well-scoped tasks, while the most rigorous RCT documents a 19% slowdown for experienced developers, and telemetry across 10,000+ developers shows 98% more pull requests but 91% longer review times with flat delivery metrics. This paper argues these findings constitute the Productivity-Reliability Paradox (PRP): a systematic phenomenon emerging from non-deterministic code generators and insufficient specification discipline. Through a multivocal literature review of 67 sources (2022-2026), this paper: (1) formally defines the PRP with three moderating variables (task abstraction, codebase maturity, developer experience) and two amplifying mechanisms (code review bottleneck, context window constraint); (2) proposes the AI-Augmented Methodology Taxonomy (AAMT), classifying six methodologies under three AI integration tiers; (3) introduces the Specification Governance Model (SGM), grounded in Transaction Cost Economics, with a practical governance decision guide; and (4) evaluates Spec Kit and TDAD as SGM instantiations via a four-month pilot study. Specification discipline, not model capability, is the binding constraint on AI-assisted software dependability.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes