Mateusz Dziemian, Maxwell Lin, Xiaohan Fu et al. · eth-zurich
This addresses a critical security threat for users of AI agents in high-stakes settings, revealing fundamental weaknesses in current models.
Encryption, privacy, network security
Mateusz Dziemian, Maxwell Lin, Xiaohan Fu et al. · eth-zurich
This addresses a critical security threat for users of AI agents in high-stakes settings, revealing fundamental weaknesses in current models.
Yihao Zhang, Zeming Wei, Xiaokun Luan et al.
This addresses critical security risks for users of interconnected multi-agent systems, exposing vulnerabilities that could lead to autonomous attacks without attacker intervention.
Yu He, Haozhe Zhu, Yiming Li et al.
This addresses a critical security vulnerability in LLM agents for users deploying them in untrusted environments, offering a novel defense paradigm that is resilient to adaptive attacks.
Lichao Wu, Sasha Behrouzi, Mohamadreza Rostami et al.
This exposes a fundamental flaw in current LLM safety alignment, posing risks for ethical deployment.
Weiyang Guo, Zesheng Shi, Zeen Zhu et al.
This work reveals a new security threat for LLMs trained with RLVR, a paradigm used to improve reasoning, by demonstrating that backdoors can be injected via data poisoning alone.
Zijun Wang, Haoqin Tu, Letian Zhang et al.
This highlights critical security risks for users of personal AI agents with broad system access, revealing inherent architectural vulnerabilities that require systematic safeguards.
Yulin Shen, Xudong Pan, Geng Hong et al.
This exposes a critical security gap in MCP deployments for AI safety and tool-augmented agents, representing a novel attack vector rather than an incremental improvement.
Yuepeng Hu, Yuqi Jia, Mengyuan Li et al.
For the security of LLM agent ecosystems, this work reveals a critical vulnerability in tool code implementations that current defenses cannot address.
Quanchen Zou, Moyang Chen, Zonghao Ying et al.
This work addresses a systemic flaw in LVLM safety for users relying on secure AI systems, representing a novel attack paradigm rather than an incremental improvement.
Yukun Jiang, Yage Zhang, Michael Backes et al.
For developers and users of LLM-based agents, this work highlights a critical safety gap in skill ecosystems and provides a benchmark to evaluate defenses against harmful skills.
Yihao Zhang, Kai Wang, Jiangrong Wu et al.
This work addresses the critical security vulnerability of multi-turn jailbreaking in LLMs, which is more covert and persistent than single-turn attacks, and provides both an attack framework and a defense strategy.
Jiacheng Liang, Yao Ma, Tharindu Kumarage et al. · amazon-science
For LLM safety alignment, ARES provides a new paradigm for identifying and fixing systemic weaknesses in RLHF that existing red-teaming methods miss.
Davis Brown, Mahdi Sabbaghi, Luze Sun et al.
For AI safety researchers, this work highlights a practical attack vector and provides benchmarks to evaluate defenses against covert misuse.
Ziqing Yang, Yixin Wu, Rui Wen et al.
For security researchers and adversaries, this work provides a practical method to identify guardrails in black-box AI agents, enabling targeted attacks.
Alexander Panfilov, Peter Romov, Igor Shilov et al.
This work addresses the automation of safety and security research for LLMs, demonstrating incremental progress in adversarial red-teaming.
Ziqiao Kong, Wanxu Xia, Chong Wang et al.
This addresses the problem of recurring vulnerabilities in decentralized finance smart contracts for auditors and developers, offering a systematic approach that is not purely incremental but builds on shared abstractions.
Zekun Fei, Zihao Wang, Weijie Liu et al.
For users of remotely hosted MoE LLM services, this attack reveals a new vulnerability that bypasses safety alignment through input-only perturbations, posing a security risk to real-world deployments.
Zhenlin Xu, Xiaogang Zhu, Yu Yao et al.
This exposes critical security risks in LLM agent systems, affecting developers and users by enabling attacks that bypass safety constraints.
Wei Zou, Mingwen Dong, Miguel Romero Calvo et al.
This addresses a critical security problem for AI-powered web browsers and agents, revealing a realistic and persistent threat that is not incremental but a novel attack vector.
Yechao Zhang, Shiqian Zhao, Jie Zhang et al.
This addresses a critical security problem for users of personal AI agents, revealing an inherent architectural flaw that enables silent memory pollution without requiring prompt injection.