SE AIMay 25

Meta-Engineering Harnesses for AI-Native Software Production: A Contract-Driven Adversarial Verification Architecture with Early Deployment Report

Satadru Sengupta, Tamunokorite Briggs, Ivan Myshakivskyi

arXiv:2605.256656.8

Predicted impact top 83% in SE · last 90 daysOriginality Incremental advance

AI Analysis

For organizations requiring continuous software delivery (e.g., CTO-as-a-service), this architecture aims to make AI-native production reliable and auditable, but the results are preliminary and incremental.

The paper presents a meta-engineering harness for AI-native software production that uses contract-driven adversarial verification to continuously produce, verify, and improve software. Early deployment over 17 features revealed contract incompleteness and verification-boundary issues, leading to targeted improvements.

AI-native software development is often evaluated at the level of individual models, prompts, or generated artifacts. This framing is insufficient for production environments where software must be continuously produced, verified, deployed, maintained, and adapted across many operational contexts and long time horizons. We present a meta-engineering harness: a software-production architecture that transforms operational and product feature requirements into explicit contracts, routes work through role-specialized AI agents, performs independent and adversarial verification, and continuously improves itself through structured failure classification and outer-loop calibration. The harness is designed for settings in which software delivery is not a one-time project but an ongoing operating function. In our motivating application, CTO-as-a-service for small service firms, the system manages websites, booking flows, payment systems, backoffice workflow automations, and AI-agent interfaces as continuously evolving technical infrastructure rather than one-off deliverables. We describe the layered architecture, including two-pass contract compilation, persistent markdown memory with specialization records, attention-based and independence-based verifications, a four-way failure arbiter, and outer-loop calibration. We report results from an early production deployment spanning 17 features over several weeks, including a detailed in-app payments case study that revealed contract incompleteness and verification-boundary issues. These observations directly drove targeted improvements to the harness. The contribution is an implemented, measurable, and extensible verification architecture for making AI-native service-as-a-software production reliable, auditable, and improvable over time.

View on arXiv PDF

Similar