AILGDec 12, 2024

Neural Interactive Proofs

arXiv:2412.08897v26 citationsh-index: 13ICLR
Originality Incremental advance
AI Analysis

This work addresses the challenge of building safer AI systems by developing foundational methods for verifiable interactions between agents, though it is incremental in extending existing interactive proof concepts to neural networks.

The paper tackles the problem of enabling a computationally limited verifier to learn to interact with powerful but untrusted neural network provers to solve tasks, introducing a unifying framework and new protocols for neural interactive proofs, with experiments on graph isomorphism and code validation using large language models.

We consider the problem of how a trusted, but computationally bounded agent (a 'verifier') can learn to interact with one or more powerful but untrusted agents ('provers') in order to solve a given task. More specifically, we study the case in which agents are represented using neural networks and refer to solutions of this problem as neural interactive proofs. First we introduce a unifying framework based on prover-verifier games, which generalises previously proposed interaction protocols. We then describe several new protocols for generating neural interactive proofs, and provide a theoretical comparison of both new and existing approaches. Finally, we support this theory with experiments in two domains: a toy graph isomorphism problem that illustrates the key ideas, and a code validation task using large language models. In so doing, we aim to create a foundation for future work on neural interactive proofs and their application in building safer AI systems.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes