CRLGMAApr 15, 2025

CEE: An Inference-Time Jailbreak Defense for Embodied Intelligence via Subspace Concept Rotation

arXiv:2504.13201v23 citationsh-index: 8
Originality Incremental advance
AI Analysis

This addresses safety risks for embodied intelligence systems like robots and autonomous vehicles, offering an incremental improvement over existing defenses.

The paper tackles the problem of jailbreak risks in embodied intelligence systems using large language models by proposing Concept Enhancement Engineering (CEE), an inference-time defense that improves defense success rates without additional training or external modules.

Large Language Models (LLMs) are increasingly becoming the cognitive core of Embodied Intelligence (EI) systems, such as robots and autonomous vehicles. However, this integration also exposes them to serious jailbreak risks, where malicious instructions can be transformed into dangerous physical actions. Existing defense mechanisms suffer from notable drawbacks--including high training costs, significant inference delays, and complex hyperparameter tuning--which limit their practical applicability. To address these challenges, we propose a novel and efficient inference-time defense framework: Concept Enhancement Engineering (CEE). CEE enhances the model's inherent safety mechanisms by directly manipulating its internal representations, requiring neither additional training nor external modules, thereby improving defense efficiency. Furthermore, CEE introduces a rotation-based control mechanism that enables stable and linearly tunable behavioral control of the model. This design eliminates the need for tedious manual tuning and avoids the output degradation issues commonly observed in other representation engineering methods. Extensive experiments across multiple EI safety benchmarks and diverse attack scenarios demonstrate that CEE significantly improves the defense success rates of various multimodal LLMs. It effectively mitigates safety risks while preserving high-quality generation and inference efficiency, offering a promising solution for deploying safer embodied intelligence systems.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes