LGSep 9, 2024
Interpolation, Extrapolation, Hyperpolation: Generalising into new dimensionsToby Ord · oxford
This paper introduces the concept of hyperpolation: a way of generalising from a limited set of data points that is a peer to the more familiar concepts of interpolation and extrapolation. Hyperpolation is the task of estimating the value of a function at new locations that lie outside the subspace (or manifold) of the existing data. We shall see that hyperpolation is possible and explore its links to creativity in the arts and sciences. We will also examine the role of hyperpolation in machine learning and suggest that the lack of fundamental creativity in current AI systems is deeply connected to their limited ability to hyperpolate.
AIMay 8, 2025
Is there a half-life for the success rates of AI agents?Toby Ord · oxford
Building on the recent empirical work of Kwa et al. (2025), I show that within their suite of research-engineering tasks the performance of AI agents on longer-duration tasks can be explained by an extremely simple mathematical model -- a constant rate of failing during each minute a human would take to do the task. This implies an exponentially declining success rate with the length of the task and that each agent could be characterised by its own half-life. This empirical regularity allows us to estimate the success rate for an agent at different task lengths. And the fact that this model is a good fit for the data is suggestive of the underlying causes of failure on longer tasks -- that they involve increasingly large sets of subtasks where failing any one fails the task. Whether this model applies more generally on other suites of tasks is unknown and an important subject for further work.
CYFeb 12, 2025
Inference Scaling Reshapes AI GovernanceToby Ord · oxford
The shift from scaling up the pre-training compute of AI systems to scaling up their inference compute may have profound effects on AI governance. The nature of these effects depends crucially on whether this new inference compute will primarily be used during external deployment or as part of a more complex training programme within the lab. Rapid scaling of inference-at-deployment would: lower the importance of open-weight models (and of securing the weights of closed models), reduce the impact of the first human-level models, change the business model for frontier AI, reduce the need for power-intense data centres, and derail the current paradigm of AI governance via training compute thresholds. Rapid scaling of inference-during-training would have more ambiguous effects that range from a revitalisation of pre-training scaling to a form of recursive self-improvement via iterated distillation and amplification.