CYBERSECEVAL 3: Advancing the Evaluation of Cybersecurity Risks and Capabilities in Large Language Models
This work addresses cybersecurity risks for LLM developers and users, but it is incremental as it builds on previous benchmarks by adding new offensive security areas.
The authors tackled the problem of evaluating cybersecurity risks in large language models by releasing CYBERSECEVAL 3, a new benchmark suite that assesses 8 risks across categories like third-party and developer/user risks, and applied it to models including Llama 3, showing results with and without mitigations.
We are releasing a new suite of security benchmarks for LLMs, CYBERSECEVAL 3, to continue the conversation on empirically measuring LLM cybersecurity risks and capabilities. CYBERSECEVAL 3 assesses 8 different risks across two broad categories: risk to third parties, and risk to application developers and end users. Compared to previous work, we add new areas focused on offensive security capabilities: automated social engineering, scaling manual offensive cyber operations, and autonomous offensive cyber operations. In this paper we discuss applying these benchmarks to the Llama 3 models and a suite of contemporaneous state-of-the-art LLMs, enabling us to contextualize risks both with and without mitigations in place.