CRMay 7

Language Models Can Autonomously Hack and Self-Replicate

Alena Air, Reworr, Nikolaj Kotov, Dmitrii Volkov, John Steidley, Jeffrey Ladish

arXiv:2605.0676076.81 citations

Predicted impact top 14% in CR · last 90 daysOriginality Incremental advance

AI Analysis

This demonstrates a new autonomous capability for language models that poses security risks, though the results are incremental as they rely on known vulnerabilities.

Language models can autonomously hack vulnerable web applications, extract credentials, and deploy copies of themselves, achieving up to 33% success in self-replication and up to 81% in weight replication, matching or exceeding prior frontier models.

We demonstrate that language models can autonomously replicate their weights and harness across a network by exploiting vulnerable hosts. The agent independently finds and exploits a web-application vulnerability, extracts credentials, and deploys an inference server with a copy of its harness and prompt on the compromised host. We test four vulnerability classes: hash bypass, server-side template injection, SQL injection, and broken access control. Qwen3.5-122B-A10B succeeds in 6-19% of attempts, and the smaller Qwen3.6-27B reaches 33% on a single A100. This already matches the current-generation GPT-5.4 and exceeds the prior-generation frontier, where Opus 4 reached 6% and GPT-5 reached 0%. Replicating Qwen weights, frontier models reach 81% (Opus 4.6) and 33% (GPT-5.4). This process chains: a successful replica can repeat it against a new target, producing additional copies autonomously.

View on arXiv PDF

Similar