Zhantong Xue

3papers

3 Papers

6.2CRMay 5

ZK-Value: A Practical Zero-Knowledge System for Verifiable Data Valuation

Zhaoyu Wang, Pingchuan Ma, Zhantong Xue et al.

Data valuation is a foundational task in data marketplaces, where a Shapley-value attribution determines how a buyer's payment is distributed among data providers. Typically, the marketplace operator runs this attribution alone, requiring participants and external auditors to trust scores they cannot independently recompute on the underlying private data. While zero-knowledge proofs (ZKPs) can theoretically reconcile this conflict between privacy and verifiability, existing ZK valuation systems fail to scale to real-world marketplace demands due to prohibitive proving times or the requirement to disclose validation cohorts. We present ZK-Value, a practical, end-to-end ZK data-valuation system. Our solution bridges the scalability gap through a fully co-designed architecture: (1) LSH-Shapley, a locality-based valuation primitive that replaces expensive pairwise distance metrics with per-bucket collision counts; (2) ZK-LSH-Shapley, a tailored ZKP protocol that drastically reduces witness size by encoding these counts into bucket-level histograms rather than naive per-pair tensors; and (3) structural proof-system optimizations, specifically super-oracle batching and sparsity skipping. Evaluated across 12 standard datasets, ZK-Value delivers valuation quality on par with state-of-the-art baselines (within 0.033 AUROC of exact KNN-Shapley), while generating proofs in seconds to minutes and outperforming specialized ZK baselines by 12.6x to 68.1x in proving time, with verification in under 4.6 s.

12.7CRJun 12Code

From Shield to Target: Denial-of-Service Attacks on LLM-Based Agent Guardrails

Yuguang Zhou, Xunguang Wang, Pingchuan Ma et al.

LLM-based guardrails have emerged as a highly effective defense against prompt injection and jailbreak attacks in autonomous agents. However, we reveal that the very reasoning and task-following capabilities enabling this protection introduce a novel vulnerability: attackers can inject crafted data to trap the guardrail in extended reasoning loops, effectuating a systematic denial-of-service (DoS) attack. To systematically expose this threat, we design a beam-search optimization framework that crafts natural-language payloads to maximize guardrail reasoning length, utilizing an LLM proposer guided by a strategy bank. Based on the observation of guardrail's schema-following nature, we also provide another attack framework driven by mechanism-aware structural mutations with less computational load. The attack efficacy is systematically evaluated in two parts. First, in standalone evaluations, the attack generalizes across diverse guardrail architectures, safety templates, and agent benchmarks. Payloads optimized on a single open-source surrogate successfully transfer to eight leading model backbones (e.g., Claude, GPT, Gemini, DeepSeek, and Qwen), achieving a 13--63$\times$ token amplification. Second, in end-to-end real-world agent deployments (web, desktop, code, and multi-agent systems), the attack reveals up to a 148$\times$ latency amplification. We show that a single poisoned document can saturate shared guardrail infrastructures, effectively starving co-located agents and paralyzing the entire system. By uncovering this availability flaw, our work underscores the urgent need to develop cost-bounded, reasoning-robust guardrails.

17.6SEJun 23

AutoSpec: Safety Rule Evolution for LLM Agents via Inductive Logic Programming

Pingchuan Ma, Zhaoyu Wang, Zimo Ji et al.

Large language model (LLM) agents increasingly automate complex tasks by integrating language models with external tools and environments. However, their autonomy poses significant safety risks: agents may execute destructive commands, leak sensitive data, or violate domain constraints. Existing safety approaches face a fundamental tradeoff: hand-crafted rules are interpretable but brittle, with overly conservative rules blocking safe operations (high false positives) while permissive rules miss unsafe behaviors (high false negatives). Neural classifiers lack the interpretability required for safety-critical deployments. We present AutoSpec, a framework that automatically evolves deployed expert-designed safety rules from user safe/unsafe annotations through counterexample-guided inductive synthesis (CEGIS) guided by inductive logic programming (ILP). Starting from the expert rules and a stream of annotated traces, AutoSpec iteratively evaluates rules, mines false-positive and false-negative counterexamples, uses ILP to learn which predicates discriminate them, generates candidate rule edits, and verifies candidates to select the best revision. The key insight is that ILP efficiently identifies predicates that appear frequently in false negatives but rarely in false positives (or vice versa), dramatically pruning the exponential search space of rule edits. This continues until convergence, producing interpretable rules that balance precision and recall. We evaluate AutoSpec on 291 execution traces spanning code execution and embodied agent domains. AutoSpec raises rule F1 to 0.98 and 0.93 across the two domains, achieving up to 94% false positive reduction while maintaining high recall, and converges within 4-5 iterations. The ILP-guided approach achieves up to 4.8x higher F1 than heuristic CEGIS. The learned rules are human-readable, auditable, and generalize to unseen scenarios.