CYLGMay 17, 2024

Training Compute Thresholds: Features and Functions in AI Regulation

arXiv:2405.10799v224 citationsh-index: 8
Originality Synthesis-oriented
AI Analysis

This addresses the problem of regulating AI for policymakers and regulators, proposing a practical but incremental approach to oversight.

The paper argues that training compute thresholds are the most suitable metric for identifying general-purpose AI models that may pose societal risks, as they correlate with capabilities and risks, are quantifiable, and can be verified early, but should not be used alone to determine mitigation measures.

Regulators in the US and EU are using thresholds based on training compute--the number of computational operations used in training--to identify general-purpose artificial intelligence (GPAI) models that may pose risks of large-scale societal harm. We argue that training compute currently is the most suitable metric to identify GPAI models that deserve regulatory oversight and further scrutiny. Training compute correlates with model capabilities and risks, is quantifiable, can be measured early in the AI lifecycle, and can be verified by external actors, among other advantageous features. These features make compute thresholds considerably more suitable than other proposed metrics to serve as an initial filter to trigger additional regulatory requirements and scrutiny. However, training compute is an imperfect proxy for risk. As such, compute thresholds should not be used in isolation to determine appropriate mitigation measures. Instead, they should be used to detect potentially risky GPAI models that warrant regulatory oversight, such as through notification requirements, and further scrutiny, such as via model evaluations and risk assessments, the results of which may inform which mitigation measures are appropriate. In fact, this appears largely consistent with how compute thresholds are used today. As GPAI technology and market structures evolve, regulators should update compute thresholds and complement them with other metrics into regulatory review processes.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes