CROct 16, 2025
Active Honeypot Guardrail System: Probing and Confirming Multi-Turn LLM JailbreaksChenYu Wu, Yi Wang, Yang Liao
Large language models (LLMs) are increasingly vulnerable to multi-turn jailbreak attacks, where adversaries iteratively elicit harmful behaviors that bypass single-turn safety filters. Existing defenses predominantly rely on passive rejection, which either fails against adaptive attackers or overly restricts benign users. We propose a honeypot-based proactive guardrail system that transforms risk avoidance into risk utilization. Our framework fine-tunes a bait model to generate ambiguous, non-actionable but semantically relevant responses, which serve as lures to probe user intent. Combined with the protected LLM's safe reply, the system inserts proactive bait questions that gradually expose malicious intent through multi-turn interactions. We further introduce the Honeypot Utility Score (HUS), measuring both the attractiveness and feasibility of bait responses, and use a Defense Efficacy Rate (DER) for balancing safety and usability. Initial experiment on MHJ Datasets with recent attack method across GPT-4o show that our system significantly disrupts jailbreak success while preserving benign user experience.
CRApr 30, 2025
Optimizing Mouse Dynamics for User Authentication by Machine Learning: Addressing Data Sufficiency, Accuracy-Practicality Trade-off, and Model Performance ChallengesYi Wang, Chengyv Wu, Yang Liao et al.
User authentication is essential to ensure secure access to computer systems, yet traditional methods face limitations in usability, cost, and security. Mouse dynamics authentication, based on the analysis of users' natural interaction behaviors with mouse devices, offers a cost-effective, non-intrusive, and adaptable solution. However, challenges remain in determining the optimal data volume, balancing accuracy and practicality, and effectively capturing temporal behavioral patterns. In this study, we propose a statistical method using Gaussian kernel density estimate (KDE) and Kullback-Leibler (KL) divergence to estimate the sufficient data volume for training authentication models. We introduce the Mouse Authentication Unit (MAU), leveraging Approximate Entropy (ApEn) to optimize segment length for efficient and accurate behavioral representation. Furthermore, we design the Local-Time Mouse Authentication (LT-AMouse) framework, integrating 1D-ResNet for local feature extraction and GRU for modeling long-term temporal dependencies. Taking the Balabit and DFL datasets as examples, we significantly reduced the data scale, particularly by a factor of 10 for the DFL dataset, greatly alleviating the training burden. Additionally, we determined the optimal input recognition unit length for the user authentication system on different datasets based on the slope of Approximate Entropy. Training with imbalanced samples, our model achieved a successful defense AUC 98.52% for blind attack on the DFL dataset and 94.65% on the Balabit dataset, surpassing the current sota performance.