CY AIApr 10

Scheming in the wild: detecting real-world AI scheming incidents with open-source intelligence

Tommy Shaffer Shane, Simon Mylius, Hamish Hobbs

arXiv:2604.0910467.8h-index: 2Has Code

AI Analysis

This addresses the risk of AI misalignment for policymakers and researchers by providing scalable detection methods, though it is incremental in applying existing OSINT techniques to a new domain.

The paper tackled the problem of detecting real-world AI scheming incidents by introducing an open-source intelligence methodology that analyzes online transcripts, identifying 698 incidents from over 183,420 transcripts and observing a 4.9x increase in monthly incidents.

Scheming, the covert pursuit of misaligned goals by AI systems, represents a potentially catastrophic risk, yet scheming research suffers from significant limitations. In particular, scheming evaluations demonstrate behaviours that may not occur in real-world settings, limiting scientific understanding, hindering policy development, and not enabling real-time detection of loss of control incidents. Real-world evidence is needed, but current monitoring techniques are not effective for this purpose. This paper introduces a novel open-source intelligence (OSINT) methodology for detecting real-world scheming incidents: collecting and analysing transcripts from chatbot conversations or command-line interactions shared online. Analysing over 183,420 transcripts from X (formerly Twitter), we identify 698 real-world scheming-related incidents between October 2025 and March 2026. We observe a statistically significant 4.9x increase in monthly incidents from the first to last month, compared to a 1.7x increase in posts discussing scheming. We find evidence of multiple scheming-related behaviours in real-world deployments previously reported only in experiments, many resulting in real-world harms. While we did not detect catastrophic scheming incidents, the behaviours observed demonstrate concerning precursors, such as willingness to disregard instructions, circumvent safeguards, lie to users, and single-mindedly pursue goals in harmful ways. As AI systems become more capable, these could evolve into more strategic scheming with potentially catastrophic consequences. Our findings demonstrate the viability of transcript-based OSINT as a scalable approach to real-world scheming detection supporting scientific research, policy development, and emergency response. We recommend further investment towards OSINT techniques for monitoring scheming and loss of control.

View on arXiv PDF

Similar