CRAIAug 19, 2025

CCFC: Core & Core-Full-Core Dual-Track Defense for LLM Jailbreak Protection

arXiv:2508.14128v13 citationsh-index: 2
Originality Incremental advance
AI Analysis

This addresses the safety challenge of LLM deployment against prompt injection and structure-aware attacks, offering a practical solution for users and developers, though it appears incremental as it builds on existing prompt-level defense methods.

The paper tackles the problem of jailbreak attacks on large language models by introducing CCFC, a dual-track defense framework that reduces attack success rates by 50-75% compared to state-of-the-art defenses without compromising response quality on benign queries.

Jailbreak attacks pose a serious challenge to the safe deployment of large language models (LLMs). We introduce CCFC (Core & Core-Full-Core), a dual-track, prompt-level defense framework designed to mitigate LLMs' vulnerabilities from prompt injection and structure-aware jailbreak attacks. CCFC operates by first isolating the semantic core of a user query via few-shot prompting, and then evaluating the query using two complementary tracks: a core-only track to ignore adversarial distractions (e.g., toxic suffixes or prefix injections), and a core-full-core (CFC) track to disrupt the structural patterns exploited by gradient-based or edit-based attacks. The final response is selected based on a safety consistency check across both tracks, ensuring robustness without compromising on response quality. We demonstrate that CCFC cuts attack success rates by 50-75% versus state-of-the-art defenses against strong adversaries (e.g., DeepInception, GCG), without sacrificing fidelity on benign queries. Our method consistently outperforms state-of-the-art prompt-level defenses, offering a practical and effective solution for safer LLM deployment.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes