RocqSmith: Can Automatic Optimization Forge Better Proof Agents?
This work addresses the challenge of automating the tuning of AI agents for formal verification, which is incremental as it explores existing optimization methods without achieving SOTA.
The study investigated whether automatic AI agent optimization methods could improve proof-generation agents in formal verification, specifically for automated theorem proving in Rocq, finding that while some optimizers provided measurable gains, none matched the performance of a carefully engineered state-of-the-art agent.
This work studies the applicability of automatic AI agent optimization methods to real-world agents in formal verification settings, focusing on automated theorem proving in Rocq as a representative and challenging domain. We evaluate how different automatic agent optimizers perform when applied to the task of optimizing a Rocq proof-generation agent, and assess whether parts of the fine-grained tuning of agentic systems, such as prompt design, contextual knowledge, and control strategies, can be automated. Our results show that while several optimizers yield measurable improvements, simple few-shot bootstrapping is the most consistently effective; however, none of the studied methods matches the performance of a carefully engineered state-of-the-art proof agent.