CRJan 9
Jailbreaking Large Language Models through Iterative Tool-Disguised Attacks via Reinforcement LearningZhaoqi Wang, Zijian Zhang, Daqing He et al.
Large language models (LLMs) have demonstrated remarkable capabilities across diverse applications, however, they remain critically vulnerable to jailbreak attacks that elicit harmful responses violating human values and safety guidelines. Despite extensive research on defense mechanisms, existing safeguards prove insufficient against sophisticated adversarial strategies. In this work, we propose iMIST (\underline{i}nteractive \underline{M}ulti-step \underline{P}rogre\underline{s}sive \underline{T}ool-disguised Jailbreak Attack), a novel adaptive jailbreak method that synergistically exploits vulnerabilities in current defense mechanisms. iMIST disguises malicious queries as normal tool invocations to bypass content filters, while simultaneously introducing an interactive progressive optimization algorithm that dynamically escalates response harmfulness through multi-turn dialogues guided by real-time harmfulness assessment. Our experiments on widely-used models demonstrate that iMIST achieves higher attack effectiveness, while maintaining low rejection rates. These results reveal critical vulnerabilities in current LLM safety mechanisms and underscore the urgent need for more robust defense strategies.
CRApr 26
Rényi Pufferfish Privacy with Gaussian-based Priors: From Single Gaussian to Mixture ModelWenjin Yang, Ni Ding, Zijian Zhang et al.
Rényi Pufferfish Privacy (RPP) provides a Rényi divergence-based privacy framework for correlated data, but existing $\infty$-Wasserstein mechanisms are often conservative and sacrifice data utility. We study Gaussian mechanisms for RPP under Gaussian and Gaussian-mixture priors. For single Gaussian priors, we derive the exact Rényi divergence after Gaussian perturbation, obtain a relaxed closed-form sufficient condition for $(α,ε)$-RPP, and characterize the monotonicity of the calibrated noise with respect to the privacy budget $ε$ and the Rényi order $α$. To handle more general non-Gaussian and multimodal priors, we approximate secret-conditioned outputs with Gaussian mixture models and introduce an optimal-transport-based sufficient condition for RPP. Experiments on three UCI datasets with statistical (\textsc{RAW}, \textsc{MEAN}) and model-output (\textsc{BNN}, \textsc{GP}) queries show that our prior-aware mechanisms consistently require less noise than a recent RPP additive-noise baseline, achieving an average noise reduction of 48.9\%. These results show that our mechanisms can substantially improve the privacy-utility trade-off under RPP.
ETDec 5, 2024
Mixed Delay/Nondelay Embeddings Based Neuromorphic Computing with Patterned Nanomagnet ArraysChangpeng Ti, Usman Hassan, Sairam Sri Vatsavai et al.
Patterned nanomagnet arrays (PNAs) have been shown to exhibit a strong geometrically frustrated dipole interaction. Some PNAs have also shown emergent domain wall dynamics. Previous works have demonstrated methods to physically probe these magnetization dynamics of PNAs to realize neuromorphic reservoir systems that exhibit chaotic dynamical behavior and high-dimensional nonlinearity. These PNA reservoir systems from prior works leverage echo state properties and linear/nonlinear short-term memory of component reservoir nodes to map and preserve the dynamical information of the input time-series data into nondelay spatial embeddings. Such mappings enable these PNA reservoir systems to imitate and predict/forecast the input time series data. However, these prior PNA reservoir systems are based solely on the nondelay spatial embeddings obtained at component reservoir nodes. As a result, they require a massive number of component reservoir nodes, or a very large spatial embedding (i.e., high-dimensional spatial embedding) per reservoir node, or both, to achieve acceptable imitation and prediction accuracy. These requirements reduce the practical feasibility of such PNA reservoir systems. To address this shortcoming, we present a mixed delay/nondelay embeddings-based PNA reservoir system. Our system uses a single PNA reservoir node with the ability to obtain a mixture of delay/nondelay embeddings of the dynamical information of the time-series data applied at the input of a single PNA reservoir node. Our analysis shows that when these mixed delay/nondelay embeddings are used to train a perceptron at the output layer, our reservoir system outperforms existing PNA-based reservoir systems for the imitation of NARMA 2, NARMA 5, NARMA 7, and NARMA 10 time series data, and for the short-term and long-term prediction of the Mackey Glass time series data.