Verification methods for international AI agreements
This work addresses the challenge of ensuring adherence to AI governance for policymakers and international bodies, but it is incremental as it reviews and categorizes existing verification concepts without introducing new methods.
The paper tackles the problem of verifying compliance with international AI agreements by examining 10 methods to detect violations like unauthorized AI training and data centers, categorizing them into national technical means, access-dependent, and hardware-dependent approaches, and providing descriptions, historical precedents, and evasion techniques.
What techniques can be used to verify compliance with international agreements about advanced AI development? In this paper, we examine 10 verification methods that could detect two types of potential violations: unauthorized AI training (e.g., training runs above a certain FLOP threshold) and unauthorized data centers. We divide the verification methods into three categories: (a) national technical means (methods requiring minimal or no access from suspected non-compliant nations), (b) access-dependent methods (methods that require approval from the nation suspected of unauthorized activities), and (c) hardware-dependent methods (methods that require rules around advanced hardware). For each verification method, we provide a description, historical precedents, and possible evasion techniques. We conclude by offering recommendations for future work related to the verification and enforcement of international AI governance agreements.