CryptoScope: Utilizing Large Language Models for Automated Cryptographic Logic Vulnerability Detection
This addresses a critical security issue for developers and users of cryptographic software, though it is incremental as it builds on existing LLM techniques.
The paper tackles the problem of detecting subtle logic flaws in cryptographic implementations by introducing CryptoScope, a framework that uses Large Language Models with Chain-of-Thought prompting and Retrieval-Augmented Generation, achieving performance improvements of up to 28.69% over baselines and identifying 9 new vulnerabilities in open-source projects.
Cryptographic algorithms are fundamental to modern security, yet their implementations frequently harbor subtle logic flaws that are hard to detect. We introduce CryptoScope, a novel framework for automated cryptographic vulnerability detection powered by Large Language Models (LLMs). CryptoScope combines Chain-of-Thought (CoT) prompting with Retrieval-Augmented Generation (RAG), guided by a curated cryptographic knowledge base containing over 12,000 entries. We evaluate CryptoScope on LLM-CLVA, a benchmark of 92 cases primarily derived from real-world CVE vulnerabilities, complemented by cryptographic challenges from major Capture The Flag (CTF) competitions and synthetic examples across 11 programming languages. CryptoScope consistently improves performance over strong LLM baselines, boosting DeepSeek-V3 by 11.62%, GPT-4o-mini by 20.28%, and GLM-4-Flash by 28.69%. Additionally, it identifies 9 previously undisclosed flaws in widely used open-source cryptographic projects.